Machine Learning with Python Scikit Learn (MLPSL)


Course Overview

This course is a stepping stone for the “Machine Learning and Artificial Intelligence” learning path. It has been designed and developed for creating a base for the next level of courses in the above path. Below points provide high level overview about the course:

  • Understand role of Data and Machine Learning
  • Use cases of Machine Learning
  • Introduction to some of the key technologies in Data and Machine Learning
  • Providing hands-on experience in Data Acquisition, Processing, Analysis and Modeling using Python programming language
  • The participants will deal with various common types of data e.g. CSV, Web data, Social Media data etc. for pre-processing and/or building Machine Learning Models
  • During the course, the participants will also get exposure to perform Exploratory data analysis along with learning basic statistics

Who should attend

This program is designed for those who aspire for Data/ML/AI roles:

  • Data Engineers
  • Data Scientists
  • Machine Learning Engineers
  • Data Integration Engineers
  • Data Architects


Participants should preferably have some hands-on experience in programming language. Knowledge of Python would be a plus.

Outline: Machine Learning with Python Scikit Learn (MLPSL)

Understanding the Big Picture

  • Significance of Data
  • What is Machine Learning (ML)?
  • Practical Use cases
  • Concepts and Terms
  • Tools/Platforms for ML
  • Machine Learning End to End Pipeline
  • Roles and Responsibilities of Data Engineer and Data Scientist

Environment for Experiments

  • Installing Anaconda
  • Setting up Jupyter Notebook
  • Experiencing Notebooks
  • Introduction to Google Colab
  • Hands-on Exercise(s)

Python for DataScience

  • Python Overview
  • Basic Syntax
  • Functions in Python
  • Lambda Function
  • Dealing with Semi-structured data
  • Higher Order Functions
  • User defined Functions
  • Hands-on Exercise(s)

Acquiring & Exporting Data

  • Content Acquisition Approaches, Pros & Cons
  • Working with Beautiful Soup
  • Acquiring data using Rest Based APIs
  • Connecting to External data sources
  • Working with datasets
  • Manipulating the datasets
  • Exporting the datasets into external files

Basic Statistics

  • Population and Sample
  • Data Types
  • Measures of Central tendency
  • Measures of dispersion
  • Percentiles & Quartiles
  • Box plots and outlier detection
  • Creating Graphs and Reporting
  • Probability Distributions
  • Hypothesis testing
  • Hands-on Exercise(s)

NumPy Basics

  • Dealing with One-dimensional Arrays
  • Dealing with Multi-dimensional Arrays
  • Working with NumPy Array
  • NumPy Arrays Compared to Python Lists
  • Manipulating Arrays
  • Hands-on Exercise(s)

Pandas Basics

  • Basic types – Series and DataFrames
  • Working with a Series
  • Element-wise Operations
  • Creating a DataFrame from various sources e.g. CSV
  • Data Manipulation using Pandas
  • Hands-on Exercise(s)

Data Visualization

  • Overview
  • Key types of plots
  • Exploratory Analysis using MatPlot Lob
  • Hands-on Exercises
  • Introduction to Seaborn
  • Seaborn foundation
  • Key types of plots
  • Customizing Seaborn Plots
  • Hands-on Exercise(s)

Data Preparation for Analysis

  • Exploratory Data Analysis
  • Data Cleaning techniques
    • Deal with missing data
    • Add default values
    • Remove incomplete rows
    • Deal with error-prone columns
    • Fixing the nan values and string/float confusion
  • Data Preparation for ML
    • Normalize data types
    • Feature Scaling
    • Feature Standardization
    • Label Encoding
    • One-Hot Encoding
  • Hands-on Exercise(s)

Feature Engineering

  • What is Feature Engineering?
  • Why Feature Engineering?
  • How to apply Feature Engineering?
  • Discussions on various scenarios
  • Hands-on Exercise(s)

Machine Learning using Scikit Learn

  • Types of Machine Learning
  • Key Algorithms in Machine Learning
  • Practical Applications of Machine Learning
  • Various frameworks/Libraries popular for ML
  • Concepts and Terms
  • Why Scikit Learn?
  • Code Walkthrough
  • Hands-on Exercise(s)

Supervised Machine Learning

  • Key Classification Algorithms
  • Conditional Probability
  • Proof of Bayes Theorem
  • Naïve Bayes Classifier
  • Confusion Matrix
  • Accuracy
  • Key Regression Algorithms
  • Linear, Logistic and Other Key types of Regressions
  • Decision Trees
  • Ensemble Learning – Random Forest
  • Gradient Descent
  • Loss function
  • Bias vs Variance Tradeoff
  • Confusion Matrix
  • Evaluating Models
  • Hyper Parameter Tuning
  • Hands-on Exercise(s)

Un-Supervised Machine Learning

  • Key types of Unsupervised ML
  • Principal Component Analysis
  • Performing Clustering of data
  • Hands-on Exercise(s)

Ensemble Learning

  • Understanding Ensemble Learning
  • Types of Ensemble Learning
  • Stacking
  • Bagging
  • Boosting
  • Random Forest
  • How do these work?
  • Hands-on Code Walkthrough
  • GBMs, XGBoost, LightGBM etc.
  • Hands-on Exercises
  • Hyperparameter Tuning
  • Feature Selection using Random Forest

K-Nearest Neighbors

  • Intuition behind KNN
  • Maths behind KNN
  • How to determine K?
  • Definition of Distance
  • Pros & Cons of KNN
  • Hands-on Case Study

Naïve Bayes and NLP

  • Conditional Probability
  • Proof of Bayes Theorem
  • Naïve Bayes Classifier
  • Pro & Cons of Naïve Bayes
  • Key Regression Algorithms
  • Implementing Spam Classifier
  • Natural Language Processing
  • Vectorizers, Pros & Cons
  • NLP Case Study
  • Hands-on Exercise(s)

Deployment of Models and Hackathon

  • Production Quality Code
  • Deploying Model on Google AI Platform
  • Hackathon

Prices & Delivery methods

Online Training

5 days

  • on request
Classroom Training

5 days

  • on request


Currently there are no training dates scheduled for this course.