Big Data Analytics | Agilitics

Login

Register

Login

Register

Buy Courses

Smarter Machine Learning for everyone using TensorFlow
February 23, 2018
Apache Spark Developer
February 26, 2018
Show all

Big Data Analytics

$1,500.00 $1,300.00

Description
Day 1: Hadoop Introduction: 5 hours

  •  Why we need Hadoop
  •  Why Hadoop is in demand in market now a days
  •  Where expensive SQL based tools are failing
  •  Key points , Why Hadoop is leading tool in current It Industry Definition of
    BigData
  • Hadoop nodes
  • Introduction to Hadoop Release-1
  • Hadoop Daemons in Hadoop Release-1
  • Introduction to Hadoop Release-2
  • Hadoop Daemons in Hadoop Release-2
  • Hadoop Cluster and Racks
  • Hadoop Cluster Demo
  • New projects on Hadoop
  • How Open Source tools is capable to run jobs in lesser time Hadoop Storage –
    HDFS (Hadoop Distributed file system) Hadoop Processing Framework
    (Map Reduce / YARN) Alternates of Map Reduce
  • Why NOSQL is in much demand instead of SQL
  • Distributed warehouse for HDFS
  • Hadoop Ecosystem and its usages
  • Data import/Export tools

Day 1 : Introduction to Map Reduce and YARN (2 Hours)

  • How Map Reduce works as Processing Framework End to End execution flow
    of Map Reduce job Different tasks in Map Reduce job
  • Why Reducer is optional while Mapper is mandatory? Introduction to
    Combiner
  • Introduction to Partitioner
  • Programming languages for Map Reduce
  • Why Java is preferred for Map Reduce programming

Day 2 : Hadoop Installation and Hands-on on Hadoop machine : 4 hours

  • Hadoop installation
  • Introduction to Hadoop FS and Processing Environment’s UIs How to read
    and write files
  • Basic Unix commands for Hadoop
  • Hadoop FS shell
  • Hadoop releases practical
  • Hadoop daemons practical

Day 2 :ETL Tool (Pig) : 3 hours

  • Pig Introduction
  • Why Pig if Map Reduce is there?
  • How Pig is different from Programming languages Pig Data flow Introduction
  • How Schema is optional in Pig
  • Pig Data types
  • Pig Commands – Load, Store , Describe , Dump Map Reduce job started by
    Pig Commands
  • Execution plan

Day 3 : Hive Warehouse (3 hours)

  • Hive Introduction
  • Meta storage and meta store
  • Introduction to Derby Database
  • Hive Data types
  • HQL
  • DDL, DML and sub languages of Hive
  • Internal , external and Temp tables in Hive
  • Differentiation between SQL based Datawarehouse and Hive
Day 3 : Hive Level-2 (Complex) (2 Hours)

  • Hive releases
  • Why Hive is not best solution for OLTP OLAP in Hive
  • Partitioning
  • Bucketing
  • Hive Architecture
  • Thrift Server
  • Hue Interface for Hive
  • How to analyze data using Hive script Differentiation between Hive and
    Impala UDFs in Hive
  • Complex Use cases in Hive
  • Hive Advanced Assignment

Day 3 : NOSQL Databases and Introduction to HBase (2 Hours)

  • Introduction to NOSQL
  • Why NOSQL if SQL is in market since several years
  • Databases in market based on NOSQL
  • CAP Theorem
  • ACID Vs. CAP
  • OLTP Solutions with different capabilities
  • Which Nosql based solution is capable to handle specific requirements
    Examples of companies that uses NOSQL based databases
  • HBase Architecture of column families

Day 4 : Data Ingestion : Flume and SQOOP (2 Hours)

  • How to load data from Relational storage in Hadoop Sqoop basics
  • Sqoop practical implementation
  • Sqoop alternative
  • Sqoop connector
  • How to load data streaming data without fixe schema
  • How to load unstructured and semi structured data in Hadoop Introduction to
    Flume
  • Hands-on on Flume
  • How to load Twitter data in HDFS using Hadoop

Day 4 : ZooKeeper and Oozie (0.5 Hours)

  • Introduction to Oozie
  • Introduction to Zookeeper
  • How Zookeeper helps in Hadoop Ecosystem

Day 4 : Apache Spark Basics (3 Hours)

  • Introduction to Spark
  • Basics Features of SPARK and Scala available in Hue Why Spark demand is
    increasing in market
  • How can we use Spark with Hadoop Eco System Datasets for practice purpose

Day 4 : Emerging Trends of Big Data (0.5 Hours)

  • YARN
  • Emerging Technologies of Big Data
  • Emerging use cases e.g IoT, Industrial Internet, New Applications
  • Certifications and
  • Job Opportunities
Request a Call Back
Request For Demo