Apache Spark and Scala Hadoop

This training course delivers the key concepts and expertise developers need to use Apache Spark to develop high-performance parallel applications. Participants will learn how to use Spark SQL to query structured data and Spark Streaming to perform real-time processing on streaming data from a variety of sources. Developers will also practice writing applications that use core Spark to perform ETL processing and iterative algorithms. The course covers how to work with “big data” stored in a distributed file system, and execute Spark applications on a Hadoop cluster. After taking this course, participants will be prepared to face real-world challenges and build applications to execute faster decisions, better decisions, and interactive analysis, applied to a wide variety of use cases, architectures, and industries.

request

Can’t find a batch you were looking for?

 

This training course delivers the key concepts and expertise developers need to use Apache Spark to develop high-performance parallel applications. Participants will learn how to use Spark SQL to query structured data and Spark Streaming to perform real-time processing on streaming data from a variety of sources. Developers will also practice writing applications that use core Spark to perform ETL processing and iterative algorithms. The course covers how to work with “big data” stored in a distributed file system, and execute Spark applications on a Hadoop cluster. After taking this course, participants will be prepared to face real-world challenges and build applications to execute faster decisions, better decisions, and interactive analysis, applied to a wide variety of use cases, architectures, and industries.

Course content

 

1. Introduction
  • Introduction to Apache Hadoop and the Hadoop Ecosystem
  • Apache Hadoop Overview
  • Data Processing
  • Introduction to the Hands-On Exercises
2. Apache Hadoop File Storage
  • Apache Hadoop Cluster Components
  • HDFS Architecture
  • Using HDFS
3. Distributed Processing on an Apache Hadoop Cluster
  • YARN Architecture
  • Working With YARN
4. Apache Spark Basics
  • What is Apache Spark?
  • Starting the Spark Shell
  • Using the Spark Shell
  • Getting Started with Datasets and DataFrames
  • DataFrame Operations
5.  Working with DataFrames and Schemas
  • Creating DataFrames from Data Sources
  • Saving DataFrames to Data Sources
  • DataFrame Schemas
  • Eager and Lazy Execution
6.  Analyzing Data with DataFrame Queries
  • Querying DataFrames Using Column Expressions
  • Grouping and Aggregation Queries
  • Joining DataFrames
7.  RDD Overview
  • RDD Overview
  • RDD Data Sources
  • Creating and Saving RDDs
  • RDD Operations
8.  Transforming Data with RDDs
  • Writing and Passing Transformation Functions
  • Transformation Execution
  • Converting Between RDDs and DataFrames
9. Aggregating Data with Pair RDDs
  • Key-Value Pair RDDs
  • Map-Reduce
  • Other Pair RDD Operations
10. Querying Tables and Views with SQL
  • Querying Tables in Spark Using SQL
  • Querying Files and Views
  • The Catalog API
11. Working with Datasets in Scala
  • Datasets and DataFrames
  • Creating Datasets
  • Loading and Saving Datasets
  • Dataset Operations
12. Writing, Configuring, and Running Spark Applications
  • Writing a Spark Application
  • Building and Running an Application
  • Application Deployment Mode
  • The Spark Application Web UI
  • Configuring Application Properties
13. Spark Distributed Processing
  • Review: Apache Spark on a Cluster
  • RDD Partitions
  • Example: Partitioning in Queries
  • Stages and Tasks
  • Job Execution Planning
  • Example: Catalyst Execution Plan
  • Example: RDD Execution Plan
14. Distributed Data Persistence
  • DataFrame and Dataset Persistence
  • Persistence Storage Levels
  • Viewing Persisted RDDs
15. Common Patterns in Spark Data Processing
  • Common Apache Spark Use Cases
  • Iterative Algorithms in Apache Spark
  • Machine Learning
  • Example: k-means
16. Introduction to Structured Streaming
  • Apache Spark Streaming Overview
  • Creating Streaming DataFrames
  • Transforming DataFrames
  • Executing Streaming Queries
17. Structured Streaming with Apache Kafka
  • Overview
  • Receiving Kafka Messages
  • Sending Kafka Messages
18. Aggregating and Joining Streaming DataFrames
  • Streaming Aggregation
  • Joining Streaming DataFrames
19. Conclusion
  • Message Processing with Apache Kafka
  • What Is Apache Kafka?
  • Apache Kafka Overview
  • Scaling Apache Kafka
  • Apache Kafka Cluster Architecture
  • Apache Kafka Command Line Tools

 

 

To see the full course content Download now

Course Prerequisites

 
  • This course is designed for developers and engineers who have programming experience, but prior knowledge of Spark and Hadoop is not required. Apache Spark examples and hands-on exercises are presented in Scala and Python. The ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful.

Who can attend

 
  • Any one from developer background with Hadoop knowledge is preferable.

Number of Hours: 40hrs

Certification

CCA Spark and Hadoop Developer (CCA 175)

Key features

  • One to One Training
  • Online Training
  • Fastrack & Normal Track
  • Resume Modification
  • Mock Interviews
  • Video Tutorials
  • Materials
  • Real Time Projects
  • Virtual Live Experience
  • Preparing for Certification

FAQs

DASVM Technologies offers 300+ IT training courses with 10+ years of Experienced Expert level Trainers.

  • One to One Training
  • Online Training
  • Fastrack & Normal Track
  • Resume Modification
  • Mock Interviews
  • Video Tutorials
  • Materials
  • Real Time Projects
  • Materials
  • Preparing for Certification

Call now: +91-99003 49889 and know the exciting offers available for you!

We working and coordinating with the companies exclusively to get placed. We have a placement cell focussing on training and placements in Bangalore. Our placement cell help more than 600+ students per year.

Learn from experts active in their field, not out-of-touch trainers. Leading practitioners who bring current best practices and case studies to sessions that fit into your work schedule. We have a pool of experts and trainers are composed with highly skilled and experienced in supporting you in specific tasks and provide professional support. 24x7 Learning support from mentors and a community of like-minded peers to resolve any conceptual doubts. Our trainers has contributed in the growth of our clients as well as professionals.

All of our highly qualified trainers are industry experts with at least 10-12 years of relevant teaching experience. Each of them has gone through a rigorous selection process which includes profile screening, technical evaluation, and a training demo before they are certified to train for us. We also ensure that only those trainers with a high alumni rating continue to train for us.

No worries. DASVM technologies assure that no one misses single lectures topics. We will reschedule the classes as per your convenience within the stipulated course duration with all such possibilities. If required you can even attend that topic with any other batches.

DASVM Technologies provides many suitable modes of training to the students like:

  • Classroom training
  • One to One training
  • Fast track training
  • Live Instructor LED Online training
  • Customized training

Yes, the access to the course material will be available for lifetime once you have enrolled into the course.

You will receive DASVM Technologies recognized course completion certification & we will help you to crack global certification with our training.

Yes, DASVM Technologies provides corporate trainings with Course Customization, Learning Analytics, Cloud Labs, Certifications, Real time Projects with 24x7 Support.

Yes, DASVM Technologies provides group discounts for its training programs. Depending on the group size, we offer discounts as per the terms and conditions.

We accept all major kinds of payment options. Cash, Card (Master, Visa, and Maestro, etc), Wallets, Net Banking, Cheques and etc.

DASVM Technologies has a no refund policy. Fees once paid will not be refunded. If the candidate is not able to attend a training batch, he/she is to reschedule for a future batch. Due Date for Balance should be cleared as per date given. If in case trainer got cancelled or unavailable to provide training DASVM will arrange training sessions with other backup trainer.

Your access to the Support Team is for lifetime and will be available 24/7. The team will help you in resolving queries, during and after the course.

Please Contact our course advisor +91-99003 49889. Or you can share your queries through info@dasvmtechnologies.com

like our courses