Apache Spark Developer | Agilitics





Buy Courses

Apache Hadoop Data Analyst
January 9, 2018
Advanced Angular JS Training
January 9, 2018
Show all

Apache Spark Developer

No. of Days: 4

This four day course of Spark Developer is for data engineers,
analysts, architects; software engineers; IT operations; and technical
managers interested in a thorough, hands-on overview of Apache
The course covers the core APIs for using Spark, fundamental
mechanisms and basic internals of the framework, SQL and other
high-level data access tools, as well as Spark’s streaming capabilities
and machine learning APIs.


    After taking this class you will be able to:

  • Describe Spark’s fundamental mechanics
  • Use the core Spark APIs to operate on data
  • Articulate and implement typical use cases for Spark
  • Build data pipelines with SparkSQL and DataFrames
  • Analyze Spark jobs using the UIs and logs Create Streaming and Machine Learning jobs

Pre requisitie :

  • Required
  • Basic to intermediate Linux knowledge, including:
    The ability to use a text editor, such as vi
    Familiarity with basic command-line options such a mv,
    cp, ssh, grep, cd, useradd
  • Knowledge of application development principles
  • Recommended
  • Knowledge of functional programming
  • Knowledge of Scala or Python
  • Beginner fluency with SQL

Course Overview
Lesson 1 – Introduction to Apache Spark (Day 1: 4 hours)

  • Describe the features of Apache Spark
  • Advantages of Spark
  • How Spark fits in with the Big Data application stack
  • How Spark fits in with Hadoop
  • Define Apache Spark components

Lesson 2 – Load and Inspect Data in Apache Spark (Day 1: 4 hours)

  • Describe different ways of getting data into Spark
  • Create and use Resilient Distributed Datasets (RDDs)
  • Apply transformation to RDDs
  • Use actions on RDDs
  • Load and inspect data in RDD
  • Cache intermediate RDDs
  • Use Spark DataFrames for simple queries
  • Load and inspect data in DataFrames

Lesson 3 – Build a Simple Apache Spark Application(Day 2: 4 hours)

  • Define the lifecycle of a Spark program
  • Define the function of SparkContext
  • Create the application
  • Define different ways to run a Spark application
  • Run your Spark application
  • Launch the application

Lesson 4 – Work with PairRDD (Day 2: 4 hours)

  • Review loading and exploring data in RDD
  • Load and explore data in RDD
  • Describe and create Pair RDD
  • Create and explore PairRDD
  • Control partitioning across nodes
Lesson 5 – Work with DataFrames (Day 3: 3 hours)

  • Create DataFrames
  • From existing RDD
  • From data sources
  • Work with data in DataFrames
  • Use DataFrame operations
  • Use SQL
  • Explore data in DataFrames
  • Create user-defined functions (UDF)
  • UDF used with Scala DSL
  • UDF used with SQL
  • Create and use user-defined functions
  • Repartition DataFrames
  • Supplemental Lab: Build a standalone application

Lesson 6 – Monitor Apache Spark Applications (Day 3: 2 hours)

  • Describe components of the Spark execution model
  • Use Spark Web UI to monitor Spark applications
  • Debug and tune Spark applications
  • Use the Spark Web UI

Lesson 7 – Introduction to Apache Spark Data Pipelines (Day 3: 3

  • Identify components of Apache Spark Unified Stack
  • List benefits of Apache Spark over Hadoop ecosystem
  • Describe data pipeline use cases

Lesson 8 – Create an Apache Spark Streaming Application(Day 4: 4

  • Describe Spark Streaming architecture
  • Create DStreams and a Spark Streaming application
  • Build and run a Streaming application which writes to HBase
  • Apply operations on DStream
  • Define window operations
  • Build and run a Streaming application with SQL
  • Build and run a Streaming application with Windows and SQL
  • Describe how Streaming applications are fault-tolerant

Lesson 9 – Use Apache Spark GraphX (Day 4: 2 hours)

  • Describe GraphX
  • Define regular, directed, and property graphs
  • Create a property graph
  • Perform operations on graphs
  • Create a property graph
  • Apply graph operations

Lesson 10 – Use Apache Spark MLlib (Day 4: 2 hours)

  • Describe Spark MLlib
  • Describe the Machine Learning techniques
  • Classification
  • Clustering
  • Collaborative filtering
  • Use collaborative filtering to predict user choice
  • Load and inspect data using the Spark shell
Reviews (0)


There are no reviews yet.

Be the first to review “Apache Spark Developer”

Your email address will not be published. Required fields are marked *

Request a Call Back
Request For Demo