Scalable Machine Learning (SML)

Course Overview

This course has been designed and developed for providing exposure to participants in Scalable Machine learning. Cloudera Hadoop and Spark Frameworks being used for implementing Scalable Machine Learning Algorithms using Scala/Python programming language.

Who should attend

This program is designed for:

Software Developer
Data Scientist
Data Engineer
Big Data Engineer

Prerequisites

Participants should have knowledge level equivalent to what is specified in “Data and Machine Learning Fundamentals” course.

Outline: Scalable Machine Learning (SML)

Introduction to Scalable Machine Learning

What is Scalable Machine Learning?
Why it is required?
Key platforms for performing Scalable Machine Learning
Scalable Machine Learning Project End to End Pipeline
Spark Introduction
Why Spark for Scalable Machine Learning?
Databricks Platform Demo
Approaches for scaling sci-kit learn code
Hands-on Exercise(s): Experiencing the first notebook

Why Spark for Scalable Machine Learning (SML)?

Problems with Traditional Machine Learning Frameworks
Machine Learning at Scale – Various options
Iterative Algorithms
How Spark performs well for Iterative Machine Learning Algorithms?
Hands-on Exercise(s)

Scalable Machine Learning on Enterprise Platform

Acquiring Structured content from Relational Databases
Acquiring Semi-structured content from Log Files
Acquiring Unstructured content from other key sources like Web
Tools for Performing Data acquisition at Scale
Sqoop, Flume and Kafka Introduction, use cases and architectures
Hands-on Exercise(s)

Data Acquisition at Scale

Data Pre-Processing for Modeling

Using the Spark Shell
Resilient Distributed Datasets (RDDs)
Functional Programming with Spark
RDD Operations
Key-Value Pair RDDs
MapReduce and Pair RDD Operations
Building and Running a Spark Application
Performing Data Validation
Data De-Duplication
Detecting Outliers
Hands-on Exercise(s)

Working with Iterative Algorithms

Dealing with RDD Infinite Lineages
Caching Overview
Distributed Persistence
Checkpointing of an Iterative Machine Learning Algorithm
Hands-on Exercise(s)

Spark SQL

Introduction
Dataframe API
Performing ad-hoc query analysis using Spark SQL
Hands-on Exercise(s)

Spark Machine Learning using MLLib

Spark ML vs Spark MLLib
Data types and key terms
Feature Extraction
Linear Regression using Spark MLLib
Hands-on Exercise(s)

Spark Machine Learning using ML

Spark ML Overview
Transformers and Estimators
Pipelines
Implementing Decision Trees
K-Means Clustering using Spark ML
Hands-on Exercise(s)

Decision Trees and Random Forest

Types – Classification and Regression trees
Gini Index, Entropy and Information Gain
Building Decision Trees
Pruning the trees
Prediction using Trees
Ensemble Models
Bagging and Boosting
Advantages of using Random Forest
Working with Random Forest
Ensemble Learning
How ensemble learning works
Building models using Bagging
Random Forest algorithm
Random Forest model building
Fine tuning hyper-parameters
Hands-on Exercise(s)

Model Evaluation, Optimization and Deployment

Model Evaluation
Optimizing a Model
Deploying Model
Best Practices

Prices & Delivery methods

Online Training

Duration
3 days

Price

on request

Dates and Booking

Request a date

Classroom Training

Duration
3 days

Price

on request

Dates and Booking

Request a date

Schedule

Currently there are no training dates scheduled for this course.

Request a date