Course Overview
This course provides a foundational overview of the hardware, software, and networking components required to develop and manage AI models at scale. It explores Google Cloud's AI Hypercomputer architecture, compares compute accelerators like GPUs and TPUs, and examines the critical data pipelines and storage solutions necessary to maximize training performance.
Who should attend
IT decision-makers and infrastructure architects looking to understand the technical requirements and the AI Hypercomputer’s offerings for enterprise-grade AI deployment.
Prerequisites
Familiarity with cloud computing concepts and general data center infrastructure.
Course Objectives
- Differentiate between the layers of the AI Hypercomputer.
- Select appropriate accelerators for the most cost-effective AI workloads.
- Evaluate storage and networking solutions to maximize training goodput.
- Compare various deployment and consumption models for resource optimization.
Outline: AI Infrastructure Essentials (AIIE)
Module 1 - Foundations of AI Infrastructure
Topics:
- Definition of AI infrastructure
- The evolution of computing demands
- The need for new computing power
Objectives:
- N/A
Activities:
- N/A
Module 2 - Google Cloud’s AI Hypercomputer
Topics:
- The AI Hypercomputer
- The 3 layers of the AI Hypercomputer: Overview
Objectives:
- Differentiate between the layers of the AI Hypercomputer.
Activities:
- N/A
Module 3 - Compute Accelerators: GPUs and TPUs
Topics:
- Graphics Processing Units
- GPU architecture
- Google Cloud GPU family
- Selecting GPUs
- Tensor Processing Units
- TPU architecture
- Google Cloud TPU family
- Best practices and considerations
Objectives:
- item
Activities:
- 1x exercise/discussion
Module 4 - The AI Data Pipeline: Network and Storage
Topics:
- Maximizing goodput
- Networking for data ingestion and training
- Storage for data preparation and training
- Architecture for inference
Objectives:
- Evaluate storage and networking solutions to maximize training goodput.
Activities:
- 1x discussion
Module 5 - Orchestration and Consumption
Topics:
- Deployment options
- Flexible consumption
Objectives:
- Compare various deployment and consumption models for resource optimization.
Activities:
- N/A
Module 6 - Course Summary and Quiz
Topics:
- Course summary
- Q&A
- Quiz
Objectives:
- Differentiate between the layers of the AI Hypercomputer
- Select appropriate accelerators for the most cost effective AI workloads
- Evaluate storage and networking solutions to maximize training goodput
- Compare various deployment and consumption models for resource optimization
Activities:
- 1x quiz with 4 MCQs