Hadoop Topics

1. Introduction to Hadoop
            Enterprise Data Trends @ Scale
            What is Big Data?
            A Market for Big Data
            Characteristics of Big Data 3V 5V 7V's of Big Data
            Most Common New Types of Data
            Moving from Causation to Correlation
            What is Hadoop? And Why Hadoop?
            Traditional Systems vs. Hadoop
            What is Hadoop 2.0?
            Overview of a Hadoop Cluster and Core components of Hadoop
            Different distributions of Hadoop
            Hadoop Use Case
            Lab exercise: - Login to Your Cluster
2. Hadoop Architecture
            Characteristics of Hadoop
a. Fault tolerance
b. replication
c. block size
d. robustness
What is node, Rack, Cluster, datacenter and Data Hub
MapReduce Architecture
HDFS Architecture
Understanding Block Storage
Demonstration:
 Understanding Block Storage
The NameNode
The Data Nodes
HDFS Clients



3. Installing Hadoop Cluster using Cloudera Manager
            Minimum Hardware Requirements
            Minimum Software Requirements
            A Formidable Starter Cluster
            Lab exercise: - Setting up the Environment
            Lab exercise:- Installing Cloudera Manager and CDH
            Lab exercise :- Adding Services to Cluster
4. Configuring Hadoop
            Hadoop configuration files (core, hdfs. mapred,yarn-site.xml ,      bigtop_utils , master and slave files)
            Configuration Considerations
            Deployment Layout
            Configuring Hadoop Ports
            Configuring HDFS
            What Does the File System Check Look For?
            Replication Factor
            Understanding Hadoop Logs
            What is Cloudera Manager
            Configuration via Cloudera Manager
            Management Monitoring
            REST API and Thrift Server Overview
            Lab exercise :-
Commissioning and Decommissioning of nodes
Lab exercise: -
Stopping and Starting CDH Services
Lab exercise: -
Using HDFS Commands, hadoop fsck and syntax and hadoop dfsadmin command
        5. Ensuring Data Integrity 
                        Replication Placement
                        Data Integrity - Writing Data
                        Data Integrity - Reading Data
                        Data Integrity - Block Scanning
                        Running a File System Check
                        What Does the File System Check Look for?
                        hadoop fsck Syntax
                        Data Integrity - File System Check: Commands & Output
                        Hadoop dfsadmin Command
                        NameNode Information
                        Changing the Replication Factor
                        Lab exercise: -
 Verify Data with Block Scanner and fsck 
       7. MapReduce and YARN
                        MapReduce
                        Understanding MapReduce
                        What is YARN?
                        YARN Architecture (RM, NM, AM, Container)
                        Lifecycle of a YARN Application
                        Configuring YARN
                        Configuring MapReduce tools
                        YARN application logs
                        YARN CLI
                        Lab exercise: - Troubleshooting a MapReduce Job

     8. Job Schedulers
                        Overview of Job Scheduling
                        The Built-in Schedulers
                        Overview of the Capacity Scheduler
                        Configuring the Capacity Scheduler
                        Defining Queues
                        Configuring Capacity Limits
                        Configuring User Limits
                        Configuring Permissions
                        Overview of the Fair Scheduler
                        Multi-Tenancy Limits
                        Lab exercise: Configuring the Capacity Scheduler

     9. Enterprise Data Movement Backup and Recovery
                        What should you backup?
                        HDFS Snapshots
                        HDFS Data - Backups
                        HDFS Data - Automate & Restore
                        Overview of BDR (Backup Disaster Recovery)
                        Lab exercise:- Using HDFS Snapshots
                        Managing Resources
                                    - Configuring groups with Static Service Pools
                                    - The Fair Scheduler
                                    - Configuring Dynamic Resource Pools
                                    - YARN Memory and CPU Settings
    
     10. Hive Administration
                        Introduction and architecture of Hive
                        Comparing Hive with RDBMS
                        Hive Components-- Hive MetaStore, HiveServer2, HCatalog
                        Hive Clients-- beeline

       11. Sqoop
                        Overview of Sqoop
                        The Sqoop Import Tool
                        Importing a Table
                        Importing Specific Columns
                        The Sqoop Export Tool
            Lab exercise: - Using Sqoop
        12. Flume
                        Flume Introduction
                        Installing Flume
                        Flume Configuration
                        Monitoring Flume
            Lab exercise: - Install and Test Flume
       
      13. Oozie
            Oozie Overview
            Oozie Components
            Jobs, Workflows, Coordinators, Bundles
            Workflow Actions and Decisions
            Oozie Job Submission
            Oozie Console
            The Oozie CLI
            Using the Oozie CLI
            Oozie Actions
Lab exercise: Running an Oozie Workflow

15. HBASE
            Overview
            Why HBASE ?
            Architecture
HBASE Components and Daemons
HBASE Administration and Cluster Management
Cluster Monitoring and Troubleshooting 
Cloudera Manager Monitoring Features
Configuring Events and Alerts
Monitoring Hadoop Clusters
Troubleshooting Hadoop services
Common Misconfigurations
Monitoring Cluster services using Charts
Using Trigger option
Monitoring JVM Processes
Understanding JVM Memory
Eclipse Memory Analyzer
JVM Memory Heap Dump
Java Management Extensions (JMX)
Garbage Collection Tuninig



16. Commissioning and De-commissioning of Cluster Nodes

            Decommissioning and Commissioning Nodes
            Decommissioning Nodes
            Steps for Decommissioning a Node
            Decommissioning Node States
            Steps for Commissioning a Node
            Balancer
            Balancer Threshold Setting
            Configuring Balancer Bandwidth
            Lab exercise :- Commissioning & Decommissioning Nodes
17. Backup and Recovery
            What should you backup?
            HDFS Snapshots
            HDFS Data - Backups
            HDFS Data - Automate & Restore
            Hive & Backup
            BDR (Backup Disaster Recovery)
            Lab exercise:- Using HDFS Snapshots

18. Rack Awareness
            Rack Awareness
            YARN Rack Awareness
            Replica Placement
            Rack Topology
            Rack Topology Script
            Configuring the Rack Topology Script
            Lab exercise: Configuring Rack Awareness
19. Name Node High Availability
            NameNode Architecture Cloudera
            NameNode High Availability
            HDFS HA Components
            Understanding NameNode HA
            NameNodes in HA
            Failover Modes
            NameNode Architectures
            hdfs haadmin Command
            Protecting Metadata Repositories
            Lab exercise :- Configure NameNode High Availability using Cloudera Manager
            20. Security in Hadoop
            Security Concepts - Why Hadoop Security is required?
            Kerberos Synopsis - How it works?
            - Enabling Kerberos via Cloudera Manager Lab exercise :- Installing and configuring Kerberos

Miscellaneous: 
Overview & Architecture of the following: 
Kafka
Solr