To Know More About Hadoop Scroll Down
Module 1
Introduction to Big Data
Rise of Big Data
Compare Hadoop vs traditonal systems
Hadoop Master-Slave Architecture
Understanding HDFS Architecture
NameNode, DataNode, Secondary Node
Learn about JobTracker, TaskTracker
Module 2
HDFS and MapReduce Architecture
Core components of Hadoop
Understanding Hadoop Master-Slave Architecture
Learn about NameNode, DataNode, Secondary Node
Understanding HDFS Architecture
Anatomy of Read and Write data on HDFS
MapReduce Architecture Flow
JobTracker and TaskTracker
Module 3
Hadoop Configuration
Hadoop Modes
Hadoop Terminal Commands
Cluster Configuration
Web Ports
Hadoop Configuration Files
Reporting, Recovery
Module 4
Understanding Hadoop MapReduce Framework
Overview of the MapReduce Framework
Use cases of MapReduce
MapReduce Architecture
Anatomy of MapReduce Program
Mapper/Reducer Class, Driver code
Module 5
Advance MapReduce Part-1
Write your own Partitioner
Writing Map and Reduce in Python
Map side/Reduce side Join
Distributed Join
Distributed Cache
Counters
Joining Multiple datasets in MapReduce
Module 6
Advance MapReduce-Part2
MapReduce internals
Understanding Input Format
Custom Input Format
Using Writable and Comparable
Understanding Output Format
Sequence Files
JUnit and MRUnit Testing Frameworks
Module 7
Apache Pig
PIG vs MapReduce
PIG Architecture & Data types
PIG Latin Relational Operators
PIG Latin Join and CoGroup
PIG Latin Group and Union
Describe, Explain, Illustrate
PIG Latin: File Loaders & UDF
Module 8
Apache Hive And Hive QL
What is Hive
Hive DDL – Create/Show Database
Hive DDL – Create/Show/Drop Tables
Hive DML – Load Files & Insert Data
Hive SQL – Select, Filter, Join, Group By
Hive Architecture & Components
Difference between Hive and RDBMS
Module 9
Apache HiveQL
Multi-Table Inserts
Joins
Grouping Sets, Cubes, Rollups
Custom Map and Reduce scripts
Hive SerDe
Hive UDF
Hive UDAF
Module 10
Apache Fume, Sqoop Oozie
Sqoop – How Sqoop works
Sqoop Architecture
Flume – How it works
Flume Complex Flow – Multiplexing
Oozie – Simple/Complex Flow
Oozie Service/ Scheduler
Use Cases – Time and Data triggers
Module 11
NoSQL Databases
CAP theorem
RDBMS vs NoSQL
Key Value stores: Memcached, Riak
Key Value stores: Redis, Dynamo DB
Column Family: Cassandra, HBase
Graph Store: Neo4J
Document Store: MongoDB, CouchDB
Module 12
Apache HBase
When/Why to use HBase
HBase Architecture/Storage
HBase Data Model
HBase Families/ Column Families
HBase Master
HBase vs RDBMS
Access HBase Data
Module 13
Apache Zookeeper
What is Zookeeper
Zookeeper Data Model
ZNokde Types
Sequential ZNodes
Installing and Configuring
Running Zookeeper
Zookeeper use cases
Module 14
Hadoop 2.0,YARN MrV2
Hadoop 1.0 Limitations
MapReduce Limitations
HDFS 2: Architecture
HDFS 2: High availability
HDFS 2: Federation
YARN Architecture
Classic vs YARN
YARN multitenancy
YARN Capacity Scheduler
Module 15
Projects
Demo of 3 Sample projects.
–