Big Data Hadoop Development

Hadoop Development course teaches the skill set required for the learners how to setup Hadoop Cluster, how to store Big Data using Hadoop (HDFS) and how to process/analyze the Big Data using Map-Reduce Programming or by using other Hadoop ecosystems. Hadoop course will equip you with all the skills you’ll needfor your next Big Data assignment. You will learn to work with Hadoop’s Distributed File System, its processing and computation frameworks, core Hadoop distributions, and vendor-specific distributions such as Cloudera. You will learn the need for cluster management solutions and how to set up, secure, safeguard and monitor clusters and their components such as Sqoop, Flume, Pig, Hive and Hbase with this Big Data Hadoop course.

img
request

Can’t find a batch you were looking for?

 

Hadoop Development course teaches the skill set required for the learners how to setup Hadoop Cluster, how to store Big Data using Hadoop (HDFS) and how to process/analyze the Big Data using Map-Reduce Programming or by using other Hadoop ecosystems. Hadoop course will equip you with all the skills you’ll needfor your next Big Data assignment. You will learn to work with Hadoop’s Distributed File System, its processing and computation frameworks, core Hadoop distributions, and vendor-specific distributions such as Cloudera. You will learn the need for cluster management solutions and how to set up, secure, safeguard and monitor clusters and their components such as Sqoop, Flume, Pig, Hive and Hbase with this Big Data Hadoop course.

Course content

 

Introduction to BigData, Hadoop
  • Big Data Introduction
  • Hadoop Introduction
  • What is Hadoop? Why Hadoop?
  • Hadoop History?
  • Different types of Components in Hadoop?
  • HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
  • What is the scope of Hadoop?
Deep Drive in HDFS (for Storing the Data)
  • Introduction of HDFS
  • HDFS Design
  • HDFS role in Hadoop
  • Features of HDFS
  • Daemons of Hadoop and its functionality
  • Name Node
  • Secondary Name Node
  • Job Tracker
  • Data Node
  • Task Tracker
    • Anatomy of File Wright
    • Anatomy of File Read
    • Network Topology
  • Nodes
  • Racks
  • Data Center
    • Parallel Copying using DistCp
    • Basic Configuration for HDFS
    • Data Organization
  • Blocks and
  • Replication
    • Rack Awareness
    • Heartbeat Signal
    • How to Store the Data into HDFS
    • How to Read the Data from HDFS
    • Accessing HDFS (Introduction of Basic UNIX commands)
    • CLI commands
MapReduce using Java (Processing the Data)
  • The introduction of MapReduce.
  • MapReduce Architecture
  • Data flow in MapReduce
    • Splits
    • Mapper
    • Portioning
    • Sort and shuffle
    • Combiner
    • Reducer
  • Understand Difference Between Block and InputSplit
  • Role of RecordReader
  • Basic Configuration of MapReduce
  • MapReduce life cycle
    • Driver Code
    • Mapper
    • and Reducer
  • How MapReduce Works
  • Writing and Executing the Basic MapReduce Program using Java
  • Submission & Initialization of MapReduce Job.
  • File Input/Output Formats in MapReduce Jobs
    • Text Input Format
    • Key Value Input Format
    • Sequence File Input Format
    • NLine Input Format
  • Joins
    • Map-side Joins
    • Reducer-side Joins
  • Word Count Example
  • Partition MapReduce Program
  • Side Data Distribution
    • Distributed Cache (with Program)
  • Counters (with Program)
    • Types of Counters
    • Task Counters
    • Job Counters
    • User Defined Counters
    • Propagation of Counters
  • Job Scheduling
PIG 
  • Introduction tApache PIG
  • Introduction tPIG Data Flow Engine
  • MapReduce vs. PIG in detail
  • When should PIG use?
  • Data Types in PIG
  • Basic PIG programming
  • Modes of Execution in PIG
    • Local Mode and
    • MapReduce Mode
  • Execution Mechanisms
    • Grunt Shell
    • Script
    • Embedded
  • Operators/Transformations in PIG
  • PIG UDF’s with Program
  • Word Count Example in PIG
  • The difference between the Map
  • Reduce and PIG
SQOOP
  • Introduction tSQOOP
  • Use of SQOOP
  • Connect tmySql database
  • SQOOP commands
    • Import
    • Export
    • Eval
    • Codegen etc…
  • Joins in SQOOP
  • Export tMySQL
  • Export tHBase
HIVE
  • Introduction tHIVE
  • HIVE Meta Store
  • HIVE Architecture
  • Tables in HIVE
    • Managed Tables
    • External Tables
  • Hive Data Types
    • Primitive Types
    • Complex Types
  • Partition
  • Joins in HIVE
  • HIVE UDF’s and UADF’s with Programs
  • Word Count Example
HBASE 
  • Introduction tHBASE
  • Basic Configurations of HBASE
  • Fundamentals of HBase
  • What is NoSQL?
  • HBase Data Model
    • Table and Row
    • Column Family and Column Qualifier
    • Cell and its Versioning
  • Categories of NoSQL Data Bases
    • Key-Value Database
    • Document Database
    • Column Family Database
  • HBASE Architecture
    • HMaster
    • Region Servers
    • Regions
    • MemStore
    • Store
  • SQL vs. NOSQL
  • How HBASE is differed from RDBMS
  • HDFS vs. HBase
  • Client-side buffering or bulk uploads
  • HBase Designing Tables
  • HBase Operations
    • Get
    • Scan
    • Put
    • Delete
HCatalog
  • HCatalog Installation
  • Introduction to HCatalog
  • About Hcatalog with PIG,HIVE and MR
  • Hands on Exercises
 MongoDB
  • What is MongoDB?
  • Where tUse?
  • Configuration On Windows
  • Inserting the data intMongoDB?
  • Reading the MongoDB data.
Cluster Setup 
  • Downloading and installing the Ubuntu12.x
  • Installing Java
  • Installing Hadoop
  • Creating Cluster
  • Increasing Decreasing the Cluster size
  • Monitoring the Cluster Health
  • Starting and Stopping the Nodes
Zookeeper
  • Introduction Zookeeper
  • Data Modal
  • Operations
OOZIE 
  • Introduction tOOZIE
  • Use of OOZIE
  • Where tuse?
Flume 
  • Introduction tFlume
  • Uses of Flume
  • Flume Architecture
  • Flume Master
  • Flume Collectors
  • Flume Agents
SPARK
  • Spark Overview
  • Linking with Spark, Initializing Spark
  • Using the Shell
  • Resilient Distributed Datasets (RDDs)
  • Parallelized Collections
  • External Datasets
  • RDD Operations
  • Basics, Passing Functions to Spark
  • Working with Key-Value Pairs
  • Transformations
  • Actions
  • RDD Persistence
  • Which Storage Level to Choose?
  • Removing Data
  • Shared Variables
  • Broadcast Variables
  • Accumulators
  • Deploying to a Cluster
  • Unit Testing
  • Migrating from pre-1.0 Versions of Spark
  • Where to Go from Here

 

To see the full course content Download now

Course Prerequisites

 
  • Basic Unix Commands
  • Core Java (OOPS Concepts, Collections , Exceptions ) for Map Reduce Programming
  • SQL Query knowledge for Hive Queries

Who can attend

 
  • Working professionals in IT / Analytics / Statistics / Big Data / Machine Learning
  • Fresh graduates from Engineering / Mathematics / IT backgrounds
  • Professionals looking to develop skills to do statistical analysis to support decision making

Number of Hours: 40hrs

Certification

CCA - 175

Key features

  • One to One Training
  • Online Training
  • Fastrack & Normal Track
  • Resume Modification
  • Mock Interviews
  • Video Tutorials
  • Materials
  • Real Time Projects
  • Virtual Live Experience
  • Preparing for Certification

FAQs

DASVM Technologies offers 300+ IT training courses with 10+ years of Experienced Expert level Trainers.

  • One to One Training
  • Online Training
  • Fastrack & Normal Track
  • Resume Modification
  • Mock Interviews
  • Video Tutorials
  • Materials
  • Real Time Projects
  • Materials
  • Preparing for Certification

Call now: +91-99003 49889 and know the exciting offers available for you!

We working and coordinating with the companies exclusively to get placed. We have a placement cell focussing on training and placements in Bangalore. Our placement cell help more than 600+ students per year.

Learn from experts active in their field, not out-of-touch trainers. Leading practitioners who bring current best practices and case studies to sessions that fit into your work schedule. We have a pool of experts and trainers are composed with highly skilled and experienced in supporting you in specific tasks and provide professional support. 24x7 Learning support from mentors and a community of like-minded peers to resolve any conceptual doubts. Our trainers has contributed in the growth of our clients as well as professionals.

All of our highly qualified trainers are industry experts with at least 10-12 years of relevant teaching experience. Each of them has gone through a rigorous selection process which includes profile screening, technical evaluation, and a training demo before they are certified to train for us. We also ensure that only those trainers with a high alumni rating continue to train for us.

No worries. DASVM technologies assure that no one misses single lectures topics. We will reschedule the classes as per your convenience within the stipulated course duration with all such possibilities. If required you can even attend that topic with any other batches.

DASVM Technologies provides many suitable modes of training to the students like:

  • Classroom training
  • One to One training
  • Fast track training
  • Live Instructor LED Online training
  • Customized training

Yes, the access to the course material will be available for lifetime once you have enrolled into the course.

You will receive DASVM Technologies recognized course completion certification & we will help you to crack global certification with our training.

Yes, DASVM Technologies provides corporate trainings with Course Customization, Learning Analytics, Cloud Labs, Certifications, Real time Projects with 24x7 Support.

Yes, DASVM Technologies provides group discounts for its training programs. Depending on the group size, we offer discounts as per the terms and conditions.

We accept all major kinds of payment options. Cash, Card (Master, Visa, and Maestro, etc), Wallets, Net Banking, Cheques and etc.

DASVM Technologies has a no refund policy. Fees once paid will not be refunded. If the candidate is not able to attend a training batch, he/she is to reschedule for a future batch. Due Date for Balance should be cleared as per date given. If in case trainer got cancelled or unavailable to provide training DASVM will arrange training sessions with other backup trainer.

Your access to the Support Team is for lifetime and will be available 24/7. The team will help you in resolving queries, during and after the course.

Please Contact our course advisor +91-99003 49889. Or you can share your queries through info@dasvmtechnologies.com

like our courses