Modernizing Data Lakes and Data Warehouses with Google Cloud (MDLDW)

 

Course Overview

The two main components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment.

Who should attend

This course is intended for developers who are responsible for querying datasets, visualizing query results, and creating reports.

Specific job roles include:

  • Data engineer
  • Data analyst
  • Database administrators
  • Big data architects

Prerequisites

Basic proficiency with a common query language such as SQL.

Course Objectives

  • Differentiate between data lakes and data warehouses.
  • Explore use-cases for each type of storage and the available data lake and warehouse solutions on Google Cloud.
  • Discuss the role of a data engineer and the benefits of a successful data pipeline to business operations.
  • Examine why data engineering should be done in a cloud environment.

Outline: Modernizing Data Lakes and Data Warehouses with Google Cloud (MDLDW)

Module 1 - Introduction to Data Engineering

Topics:

  • The role of a data engineer
  • Data engineering challenges
  • Introduction to BigQuery
  • Data lakes and data warehouses
  • Transactional databases versus data warehouses
  • Partnering effectively with other data teams
  • Managing data access and governance
  • Build production-ready pipelines
  • Google Cloud customer case study

Objectives:

  • Discuss the role of a data engineer.
  • Discuss benefits of doing data engineering in the cloud.
  • Discuss challenges of data engineering practice and how building data pipelines in the cloud helps to address these.
  • Review and understand the purpose of a data lake versus a data warehouse, and when to use which.

Module 2 - Building a Data Lake

Topics:

  • Introduction to data lakes
  • Data storage and ETL options on Google Cloud
  • Building a data lake by using Cloud Storage
  • Securing Cloud Storage
  • Storing all sorts of data types
  • Cloud SQL as your OLTP system

Objectives:

  • Discuss why Cloud Storage is a great option to build a data lake on Google Cloud.
  • Explain how to use Cloud SQL for a relational data lake.

Module 3 - Building a Data Warehouse

Topics:

  • The modern data warehouse
  • Introduction to BigQuery
  • Getting started with BigQuery
  • Loading data into BigQuery
  • Exploring schemas
  • Schema design
  • Nested and repeated fields
  • Optimizing with partitioning and clustering

Objectives:

  • Discuss the requirements of a modern warehouse.
  • Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
  • Discuss the core concepts of BigQuery and review options of loading data into BigQuery.

Prices & Delivery methods

Online Training

Duration
1 day

Price
  • Online Training: CAD 785
  • Online Training: US$ 595
Classroom Training

Duration
1 day

Price
  • Canada: CAD 785

Click on town name or "Online Training" to book Schedule

This is an Instructor-Led Classroom course
This is a FLEX course, which is delivered both virtually and in the classroom.

Italy

Milan This is a FLEX course. Enroll
Online Training Time zone: Europe/Rome Enroll
Rome This is a FLEX course. Enroll
Online Training Time zone: Europe/Rome Enroll
Milan This is a FLEX course. Enroll
Online Training Time zone: Europe/Rome Enroll