Apache Spark Developer | Agilitics





Buy Courses

Big Data Analytics
February 23, 2018
APACHE HADOOP Administrator
February 26, 2018
Show all

Apache Spark Developer

$1,000.00 $900.00

Overview – This four day course of Spark Developer is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache

The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.

ObjectivesAfter taking this class you will be able to:

  • Describe Spark’s fundamental mechanics
  • Use the core Spark APIs to operate on data
  • Articulate and implement typical use cases for Spark
  • Build data pipelines with SparkSQL and DataFrames
  • Analyze Spark jobs using the UIs and logs
  • Create Streaming and Machine Learning jobs

Pre requisitie :


  • Basic to intermediate Linux knowledge, including: The ability to use a text editor, such as vi Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
  • Knowledge of application development principles


  • Knowledge of functional programming
  • Knowledge of Scala or Python
  • Beginner fluency with SQL

Course Overview

Lesson 1 – Introduction to Apache Spark

  • Describe the features of Apache Spark
  • Advantages of Spark
  • How Spark fits in with the Big Data application stack
  • How Spark fits in with Hadoop
  • Define Apache Spark components

Lesson 2 – Load and Inspect Data in Apache Spark

  • Describe different ways of getting data into Spark
  • Create and use Resilient Distributed Datasets (RDDs)
  • Apply transformation to RDDs
  • Use actions on RDDs
  • Load and inspect data in RDD
  • Cache intermediate RDDs
  • Use Spark DataFrames for simple queries
  • Load and inspect data in DataFrames

Lesson 3 – Build a Simple Apache Spark Application

  • Define the lifecycle of a Spark program
  • Define the function of SparkContext
  • Create the application
  • Define different ways to run a Spark application
  • Run your Spark application
  • Launch the application

Lesson 4 – Work with PairRDD

• Review loading and exploring data in RDD
• Load and explore data in RDD
• Describe and create Pair RDD
• Create and explore PairRDD
• Control partitioning across nodes

Lesson 5 – Work with DataFrames

  • Create DataFrames
    From existing RDD
    From data sources
  • Work with data in DataFrames
    Use DataFrame operations
    Use SQL
    Explore data in DataFrames
  • Create user-defined functions (UDF)
    UDF used with Scala DSL
    UDF used with SQL
    Create and use user-defined functions
  • Repartition DataFrames
  • Supplemental Lab: Build a standalone application

Lesson 6 – Monitor Apache Spark Applications

  • Describe components of the Spark execution model
  • Use Spark Web UI to monitor Spark applications
  • Debug and tune Spark applications
  • Use the Spark Web UI

Lesson 7 – Introduction to Apache Spark Data Pipelines

  • Identify components of Apache Spark Unified Stack
  • List benefits of Apache Spark over Hadoop ecosystem
  • Describe data pipeline use cases

Lesson 8 – Create an Apache Spark Streaming Application

Describe Spark Streaming architecture

  • Create DStreams and a Spark Streaming application
  • Build and run a Streaming application which writes to HBase
  • Apply operations on DStream
  • Define window operations
    Build and run a Streaming application with SQL
    Build and run a Streaming application with Windows and SQL
  • Describe how Streaming applications are fault-tolerant

Lesson 9 – Use Apache Spark GraphX

  • Describe GraphX
  • Define regular, directed, and property graphs

Lesson 10 – Use Apache Spark MLlib

  • Describe Spark MLlib
  • Describe the Machine Learning techniques
    Collaborative filter
Request a Call Back
Request For Demo