Course Overview
This 1‑day course focuses on building intelligent applications that can see, interpret, and reason over images and documents using different multimodal models and agent-based tools. Learners explore how visual and document inputs can be combined with language models to enable structured extraction, analysis, and decision-making workflows. The course emphasizes practical patterns for extracting information, orchestrating tools, and grounding model responses in visual data.
Course Content
Develop a vision-enabled generative AI application
- Introduction
- Use a vision-capable model in the Microsoft Foundry portal
- Develop a vision-based chat app
- Exercise - Develop a vision-enabled chat app
- Module assessment
- Summary
Generate images with AI
- Introduction
- What are image-generation models?
- Explore image-generation models in Microsoft Foundry portal
- Create a client application that uses an image generation model
- Exercise - Generate images with AI
- Module assessment
- Summary
Generate videos with Microsoft Foundry
- Introduction
- Deploy a video generating model
- Generate video from a prompt
- Generate video in Python
- Exercise - Generate video with Sora 2 in Microsoft Foundry
- Module assessment
- Summary
Analyze images with Content Understanding
- Introduction
- What is Content Understanding?
- Analyze images with Content Understanding
- Exercise - Analyze images with Content Understanding
- Module assessment
- Summary
Create a multimodal analysis solution with Azure Content Understanding
- Introduction
- What is Azure Content Understanding?
- Create a Content Understanding analyzer
- Use the Content Understanding API
- Exercise - Extract information from multimodal content
- Module assessment
- Summary
Create an Azure Content Understanding client application
- Introduction
- Prepare to use the AI Content Understanding API
- Create a Content Understanding analyzer
- Analyze content
- Exercise - Develop a Content Understanding client application
- Module assessment
- Summary
Extract data with Azure Document Intelligence
- Introduction
- What is Azure Document Intelligence?
- Use the Document Intelligence Studio
- Use prebuilt models
- Train and use custom models
- Exercise - Analyze documents with Document Intelligence
- Module assessment
- Summary
Create a knowledge mining solution with Azure AI Search
- Introduction
- What is Azure AI Search?
- Extract data with an indexer
- Enrich extracted data with AI skills
- Search an index
- Persist extracted information in a knowledge store
- Exercise - Create a knowledge mining solution
- Module assessment
- Summary
Who should attend
This course is designed for developers, AI engineers, and technical professionals who want to build applications that work with images and documents using multimodal, agent-driven approaches. It’s best suited for learners with basic programming experience and a general understanding of cloud or AI concepts.