HDP Analyst: Data Science Foundations | Agilitics

Login

Register

Login

Register

Buy Courses

HDP Developer: HBase Essentials
March 23, 2018
HDP Operations: Hadoop Security
March 23, 2018
Show all

HDP Analyst: Data Science Foundations

$2,400.00 $2,100.00

Duration: 3 days
Version: HW HDP DS

Overview – This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Target Audience – Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.

Category:
Description

Prerequisites – Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course.

Course ObjectivesRecognize use cases for data science on Hadoop

  • Describe the Hadoop and YARN architecture
  • Describe supervised and unsupervised learning differences
  • Use Mahout to run a machine learning algorithm on Hadoop
  • Describe the data science life cycle
  • Use Pig to transform and prepare data on Hadoop
  • Write a Python script
  • Describe options for running Python code on a Hadoop cluster
  • Write a Pig User-Defined Function in Python
  • Use Pig streaming on Hadoop with a Python script
  • Use machine learning algorithms
  • Describe use cases for Natural Language Processing (NLP)
  • Use the Natural Language Toolkit (NLTK)
  • Describe the components of a Spark application
  • Write a Spark application in Python
  • Run machine learning algorithms using Spark MLlib
  • Take data science into production

Course Outline

Format

  • 50% Lecture/Discussion
  • 50% Hands-on Labs
Hands-On Labs

  • Lab: Setting Up a Development Environment
  • Demo: Block Storage
  • Lab: Using HDFS Commands
  • Demo: MapReduce
  • Lab: Using Apache Mahout for Machine Learning
  • Demo: Apache Pig
  • Lab: Getting Started with Apache Pig
  • Lab: Exploring Data with Pig
  • Lab: Using the IPython Notebook
  • Demo: The NumPy Package
  • Demo: The pandas Library
  • Lab: Data Analysis with Python
  • Lab: Interpolating Data Points
  • Lab: Defining a Pig UDF in Python
  • Lab: Streaming Python with Pig
  • Demo: Classification with Scikit-Learn
  • Lab: Computing K-Nearest Neighbor
  • Lab: Generating a K-Means Clustering
  • Lab: POS Tagging Using a Decision Tree
  • Lab: Using NLTK for Natural Language Processing
  • Lab: Classifying Text using Naive Bayes
  • Lab: Using Spark Transformations and Actions
  • Lab Using Spark MLlib
  • Lab: Creating a Spam Classifier with MLlib
Request a Call Back
Request For Demo