Hortonworks Data Platform Developer : Apache Pig and Hive | Agilitics

Login

Register

Login

Register

Buy Courses

Hortonworks Hadoop Essentials
March 22, 2018
HDP Developer: Developer Quick Start
March 22, 2018
Show all

Hortonworks Data Platform Developer : Apache Pig and Hive

$2,900.00 $2,800.00

Duration: 4 days
Version: HW HDP PH
Certification: HDP Certified Developer (HDPCD)

Overview – This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.

Target Audience – Software developers who need to understand and develop applications for Hadoop.

Category:
Description

Prerequisites –¬†Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

Course Objectives

  • Describe Hadoop, YARN and use cases for Hadoop
  • Describe Hadoop ecosystem tools and frameworks
  • Describe the HDFS architecture
  • Use the Hadoop client to input data into HDFS
  • Transfer data between Hadoop and a relational database
  • Explain YARN and MaoReduce architectures
  • Run a MapReduce job on YARN
  • Use Pig to explore and transform data in HDFS
  • Understand how Hive tables are defined and implemented
  • Use Hive to explore and analyze data sets
  • Use the new Hive windowing functions
  • Explain and use the various Hive file formats
  • Create and populate a Hive table that uses ORC file formats
  • Use Hive to run SQL-like queries to perform data analysis
  • Use Hive to join datasets using a variety of techniques
  • Write efficient Hive queries
  • Create ngrams and context ngrams using Hive
  • Perform data analytics using the DataFu Pig library
  • Explain the uses and purpose of HCatalog
  • Use HCatalog with Pig and Hive
  • Define and schedule an Oozie workflow
  • Present the Spark ecosystem and high-level architecture
  • Perform data analysis with Spark’s Resilient Distributed Dataset API
  • Explore Spark SQL and the DataFrame API

Format

  • 50% Lecture/Discussion
  • 50% Hands-on Labs

Course Outline

DAY 1 – IN INTRODUCTION TO THE HADOOP DISTRIBUTED FILE SYSTEM

OBJECTIVES

  • Understanding Hadoop
  • The Hadoop Distributed File System
  • Ingesting Data into HDFS
  • The MapReduce Framework

LABS

  • Starting an HDP Cluster
  • Demonstration: Understanding Block Storage
  • Using HDFS Commands
  • Importing RDBMS Data into HDFS
  • Exporting HDFS Data to an RDBMS
  • Importing Log Data into HDFS Using Flume
  • Demonstration: Understanding MapReduce
  • Running a MapReduce Job

DAY 2 – AN INTRODUCTION TO APACHE PIG

OBJECTIVES

  • Introduction to Apache Pig
  • Advanced Apache Pig Programming

LABS

  • Demonstration: Understanding Apache Pig
  • Getting Starting with Apache Pig
  • Exploring Data with Apache Pig
  • Splitting a Dataset
  • Joining Datasets with Apache Pig
  • Preparing Data for Apache Hive
  • Demonstration: Computing Page Rank
  • Analyzing Clickstream Data
  • Analyzing Stock Market Data Using Quantiles

DAY 3 – AN INTRODUCTION TO APACHE HIVE

OBJECTIVES

  • Apache Hive Programming
  • Using HCatalog
  • Advanced Apache Hive Programming

LABS

  • Understanding Hive Tables
  • Understanding Partition and Skew
  • Analyzing Big Data with Apache Hive
  • Demonstration: Computing NGrams
  • Joining Datasets in Apache Hive
  • Computing NGrams of Emails in Avro Format
  • Using HCatalog withApachePig

DAY 4 – WORKING WITH SPARK CORE, SPARK SQL AND OOZIE

OBJECTIVES

  • Advanced Apache Hive Programming (Continued)
  • Hadoop 2 and YARN
  • Introduction to Spark Core and Spark SQL
  • Defining Workflow with Oozie

LABS

  • Advanced Apache Hive Programming
  • Running a YARN Application
  • Getting Started with Apache Spark
  • Exploring Apache Spark SQL
  • Defining an Apache Oozie Workflow
Request a Call Back
Request For Demo