Apache Hadoop Data Analyst | Agilitics





Buy Courses

Apache Hadoop Administrator
January 9, 2018
Apache Spark Developer
January 9, 2018
Show all

Apache Hadoop Data Analyst


No. of Days: 4

Day 1: Hadoop Introduction: 4 hours
• Why we need Hadoop
• Why Hadoop is in demand in market now a days
• Where expensive SQL based tools are failing
• Key points , Why Hadoop is leading tool in current It Industry Definition of
• Hadoop nodes
• Introduction to Hadoop Release-1
• Hadoop Daemons in Hadoop Release-1
• Introduction to Hadoop Release-2
• Hadoop Daemons in Hadoop Release-2
• Hadoop Cluster and Racks
• Hadoop Cluster Demo
• New projects on Hadoop
• How Open Source tools is capable to run jobs in lesser time Hadoop Storage –
HDFS (Hadoop Distributed file system) Hadoop Processing Framework
(Map Reduce / YARN) Alternates of Map Reduce
• Why NOSQL is in much demand instead of SQL
• Distributed warehouse for HDFS
• Hadoop Ecosystem and its usages
• Data import/Export tools
Day 2 : Hadoop Installation and Hands-on on Hadoop machine : 4 hours
• Hadoop installation
• Introduction to Hadoop FS and Processing Environment’s UIs How to read
and write files
• Basic Unix commands for Hadoop
• Hadoop FS shell
• Hadoop releases practical
• Hadoop daemons practical
Day 2 :ETL Tool (Pig) Introduction Level-1 (Basics) : 4 hours
• Pig Introduction
• Why Pig if Map Reduce is there?
• How Pig is different from Programming languages Pig Data flow Introduction
• How Schema is optional in Pig
• Pig Data types
• Pig Commands – Load, Store , Describe , Dump Map Reduce job started by
Pig Commands
• Execution plan
Day 2 :ETL Tool (Pig) Level-2 (Complex) : 4 hours
• Pig- UDFs
• Pig Use cases
• Pig Assignment
• Complex Use cases on Pig
• Real time scenarios on Pig
• When we should use Pig
• When we shouldn’t use Pig
Day 3 : Hive Warehouse (3 hours)
• Hive Introduction
• Meta storage and meta store
• Introduction to Derby Database
• Hive Data types
• DDL, DML and sub languages of Hive
• Internal , external and Temp tables in Hive
• Differentiation between SQL based Datawarehouse and Hive
Day 3 : Hive Level-2 (Complex) (3 Hours)
• Hive releases
• Why Hive is not best solution for OLTP OLAP in Hive
• Partitioning
• Bucketing
• Hive Architecture
• Hue Interface for Hive
• How to analyze data using Hive script Differentiation between Hive and
Impala UDFs in Hive
• Complex Use cases in Hive
• Hive Advanced Assignment
Day 3 : Introduction to Map Reduce (2 Hours)
• How Map Reduce works as Processing Framework End to End execution flow
of Map Reduce job Different tasks in Map Reduce job
• Why Reducer is optional while Mapper is mandatory? Introduction to
• Introduction to Partitioner
• Programming languages for Map Reduce
• Why Java is preferred for Map Reduce programming
Day 4 : NOSQL Databases and Introduction to HBase (2 Hours)
• Introduction to NOSQL
• Why NOSQL if SQL is in market since several years
• Databases in market based on NOSQL
• CAP Theorem
• OLTP Solutions with different capabilities
• Which Nosql based solution is capable to handle specific requirements
Examples of companies that uses NOSQL based databases
• HBase Architecture of column families
Day 4 : Zookeeper and SQOOP (2 Hours)
• Introduction to Zookeeper
• How Zookeeper helps in Hadoop Ecosystem
• How to load data from Relational storage in Hadoop Sqoop basics
• Sqoop practical implementation
• Sqoop alternative
• Sqoop connector
Day 4 : Flume , Oozie and YARN (2 Hours)
• How to load data streaming data without fixe schema
• How to load unstructured and semi structured data in Hadoop Introduction to
• Hands-on on Flume
• How to load Twitter data in HDFS using Hadoop
• Introduction to Oozie
• How to schedule jobs using Oozie
• What kind of jobs can be scheduled using Oozie
• How to schedule jobs which are time based
• Hadoop releases
• From where to get Hadoop and other components to install
• Introduction to YARN
• Significance of YARN
Day 4 : Apache Spark Basics (2 Hours)
• Introduction to Spark
• Basics Features of SPARK and Scala available in Hue Why Spark demand is
increasing in market
• How can we use Spark with Hadoop Eco System Datasets for practice purpose
Day 4 : Emerging Trends of Big Data (2 Hours)
• Emerging Technologies of Big Data
• Emerging use cases e.g IoT, Industrial Internet, New Applications
• Certifications and
• Job Opportunities

Reviews (0)


There are no reviews yet.

Be the first to review “Apache Hadoop Data Analyst”

Your email address will not be published. Required fields are marked *

Request a Call Back
Request For Demo