Course Overview

Please note attendees work together in teams of 5 as a minimum and the pricing advertised is per team of 5.

Course Content

This open hack provides a real-world context within which the various data science, machine learning, and DevOps capabilities within Azure can be applied to an end-to-end data science scenario; enabling participants to compare and validate options.

Entry point for people who don’t know ML
Upskill for “DevOps” with ML Devs & Data Scientists

Who should attend

Experts with different skill sets and workflows across data engineering, data science, machine learning engineering, development, operations, and other knowledge areas are working together in a collaborative way emphasizing the skills and strengths of each team member.

Software Engineers
Cloud Solution Architects
Data Scientists
ML Engineers

Prerequisites

Knowledge Prerequisites To be successful and get the most out of this OpenHack, familiarize yourself with the following: Data Science:

What is Machine Learning? (website)
What is Azure Machine Learning? (website with video)
Basic familiarity with Jupyter notebooks (website with tutorial)

DevOps:

What is DevOps? (website)
Continuous Integration
Continuous Delivery
DevOps Practices
Azure DevOps Overview (website)
Comfortable with git basics or git hands on learning (only Main: Intro sequence, Ramping up, Remote: push & Pull sections)
Understand Azure Pipelines basics or Create your first Azure Pipeline Do YAML, not classic (website + code sample)
Azure DevOps Branch Policies (website)

Programming Languages:

• Participants should have familiarity with programming languages like Python.

Tooling Prerequisites To avoid any delays with downloading or installing tooling, you are encouraged to have the following ready to go!

Course Objectives

This OpenHack enables attendees to employ fundamental up to advanced DevOps practices for the Data Science process, leveraging Azure Machine Learning Service, Azure DevOps, Azure Data Factory, and other relevant Azure services. This OpenHack simulates a real-world scenario where an insurance company needs to predict the probability that a driver will initiate an auto insurance claim in the next year and needs to be able to take the Data Scientist’s local functional model and associated data used to train the model to production in a high-quality, secure, scalable way.

During the “hacking”, attendees will focus on:

1. Understanding DevOps fundamentals as applied to the Data Science process to train and deploy machine learning models
2. Begin to apply more advanced DevOps practices (such as canary rollout or taking automated actions based on instrumentation)

By the end of the OpenHack, attendees will have built out a technical solution that automatically trains, evaluates, registers, and deploys a model, connects fundamental DevOps practices for the Data Ops for ML used to train the model, as well as implements observability aspects for the system.

Outline: OpenHack – DevOps for Data Science (OHDTSC)

Challenge 1: Local Model Build

Understand how the model build process works on a local machine/notebook for training and evaluation

Challenge 2: Portable execution

Refactor experimentation notebook to run as an experiment in ML service
Add metrics/parameter logging

Challenge 3: Scripted ML Pipeline Creation

Extend notebook from challenge 2 to use Azure ML API to set up ML pipeline that trains and registers the model
Retrieve the registered model, deploy it as an inferencing service to ACI, and test the deployed service

Challenge 4: Automated ML Pipeline Creation

DevOps pipeline integration incorporating MLOpsPython
Train, evaluate, and register a model via Azure DevOps and Azure ML Service Pipelines
Deploy and serve a model automatically after model is registered in model registry service
Implement basic acceptance testing against API endpoint in the pipeline after model deployment

Challenge 5: Observability: ML Training

Introduce a change in the model which breaks the model during training in the pipeline. Be able to understand why it broke utilizing centralized instrumented logging.
Have a dashboard which reports on one of CPU/Memory/GPU utilization and fires an alert when below or above a set threshold. This is to ensure there is optimal use of the hardware for cost and performance.

Challenge 6: Observability: ML Inference / Serving

Implement a custom metric in the model code which outputs the score, have a dashboard to see the results of this metric over time
Have a dashboard which reports on one of CPU/Memory/GPU utilization and fires an alert when below or above a set threshold. This is to ensure there is optimal use of the hardware for cost and performance.

Challenge 7: Data Ops for ML: Data Ingestion & Pre-processing

Understand the pros and cons of various Data Ingestion options available with Azure services.
Implement a solution when new data drops (i.e. file to a blob store) it automatically triggers a data ingestion pipeline to run a Python notebook to process the data, dump the file to a different folder on the blob store and invokes model training pipeline.

Challenge 8: Data Ops for ML: CI/CD Pipelines as Code

Store the source code of the data ingestion solution in SCM repository.
Implement branching policy for the data pipeline and the notebooks.
Implement CI/CD pipelines deploying the data ingestion pipeline and the notebooks to the target environment. Have the solution parametrized so it can be deployed to multiple environments.
Have CI/CD pipelines stored in SCM repository

Challenge 9: Advanced: Canary deployment

Setup a canary deployment pipeline of the model being served to production users
Analyze results of score on dashboard for the canary versus production

OpenHack – DevOps for Data Science (OHDTSC)

Course Overview

Course Content

Who should attend

Prerequisites

Course Objectives

Outline: OpenHack – DevOps for Data Science (OHDTSC)

Prices & Delivery methods

Online Training

Price

Classroom Training

Price

Schedule