GCP200DE

Data Engineering on Google Cloud

Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.

This course is comprised of the following four courses:

  • Introduction to Data Engineering on Google Cloud
  • Build Data Lakes and Data Warehouses with Google Cloud
  • Build Batch Data Pipelines on Google Cloud
  • Build Streaming Data Pipelines on Google Cloud
Google Cloud
✓ Official training Google CloudLevel Intermediate⏱️ 4 days (28h)

What you will learn

  • Design scalable data processing systems in Google Cloud.
  • Differentiate data architectures and implement data lakehouse and pipeline concepts.
  • Build and manage robust streaming and batch data pipelines.
  • Utilize AI/ML tools to optimize performance and gain process and data insights.

Prerequisites

  • Understanding of data engineering principles, including ETL/ELT processes, data modeling, and common data formats (Avro, Parquet, JSON).
  • Familiarity with data architecture concepts, specifically Data Warehouses and Data Lakes.
  • Proficiency in SQL for data querying.
  • Proficiency in a common programming language (Python recommended).
  • Familiarity with using Command Line Interfaces (CLI).
  • Familiarity with core Google Cloud concepts and services (Compute, Storage, and Identity management).

Target audience

  • Data Engineers, Data Analysts, Data Architects

Training Program

19 modules to master the fundamentals

Course 1 : Introduction to Data Engineering on Google Cloud

Objectives
  • Explain the role of a data engineer.
  • Understand the differences between a data source and a data sink.
  • Explain the different types of data formats.
  • Explain the storage solution options on Google Cloud.
  • Learn about the metadata management options on Google Cloud.
  • Understand how to share datasets with ease using Analytics Hub.
  • Understand how to load data into BigQuery using the Google Cloud console or the gcloud CLI.
Topics covered
  • →The role of a data engineer
  • →Data sources versus data sinks
  • →Data formats
  • →Storage solution options on Google Cloud
  • →Metadata management options on Google Cloud
  • →Sharing datasets using Analytics Hub
Activities

Lab: Loading Data into BigQuery

Quiz

Objectives
  • Explain the baseline Google Cloud data replication and migration architecture.
  • Understand the options and use cases for the gcloud command-line tool.
  • Explain the functionality and use cases for Storage Transfer Service.
  • Explain the functionality and use cases for Transfer Appliance.
  • Understand the features and deployment of Datastream.
Topics covered
  • →Replication and migration architecture
  • →The gcloud command-line tool
  • →Moving datasets
  • →Datastream
Objectives
  • Explain the baseline extract and load architecture diagram.
  • Understand the options of the bq command-line tool.
  • Explain the functionality and use cases for BigQuery Data Transfer Service.
  • Explain the functionality and use cases for BigLake as a non-extract-load pattern.
Topics covered
  • →Extract and load architecture
  • →The bq command-line tool
  • →BigQuery Data Transfer Service
  • →BigLake
Activities

Lab: BigLake: Qwik Start

Quiz

Objectives
  • Explain the baseline extract, load, and transform architecture diagram.
  • Understand a common ELT pipeline on Google Cloud.
  • Learn about BigQuery's SQL scripting and scheduling capabilities.
  • Explain the functionality and use cases for Dataform.
Topics covered
  • →Extract, load, and transform (ELT) architecture
  • →SQL scripting and scheduling with BigQuery
  • →Dataform
Activities

Lab: Create and Execute a SQL Workflow in Dataform

Quiz

Objectives
  • Explain the baseline extract, transform, and load architecture diagram.
  • Learn about the GUI tools on Google Cloud used for ETL data pipelines.
  • Explain batch data processing using Dataproc.
  • Learn how to use Dataproc Serverless for Spark for ETL.
  • Explain streaming data processing options.
  • Explain the role Bigtable plays in data pipelines.
Topics covered
  • →Extract, transform, and load (ETL) architecture
  • →Google Cloud GUI tools for ETL data pipelines
  • →Batch data processing using Dataproc
  • →Streaming data processing options
  • →Bigtable and data pipelines
Activities

Lab: Use Dataproc Serverless for Spark to Load BigQuery (optional)

Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow

Quiz

Objectives
  • Explain the automation patterns and options available for pipelines.
  • Learn about Cloud Scheduler and Workflows.
  • Learn about Cloud Composer.
  • Learn about Cloud Run functions.
  • Explain the functionality and automation use cases for Eventarc.
Topics covered
  • →Automation patterns and options for pipelines
  • →Cloud Scheduler and Workflows
  • →Cloud Composer
  • →Cloud Run Functions
  • →Eventarc
Activities

Lab: Use Cloud Run Functions to Load BigQuery (optional)

Quiz

Course 2 : Build Data Lakes and Data Warehouses with Google Cloud

Objectives
  • Compare and contrast data lake, data warehouse, and data lakehouse architectures.
  • Evaluate the benefits of the lakehouse approach.
Topics covered
  • →The classics: Data lakes and data warehouses
  • →The modern approach: Data lakehouse
  • →Choosing the right architecture
Activities

Quiz

Objectives
  • Discuss data storage options, including Cloud Storage for files, open table formats like Apache Iceberg, BigQuery for analytic data, and AlloyDB for operational data.
  • Understand the role of AlloyDB for operational data use cases.
Topics covered
  • →Building a data lake foundation
  • →Introduction to Apache Iceberg open table format
  • →BigQuery as the central processing engine
  • →Combining operational data in AlloyDB
  • →Combining operational and analytical data with federated queries
  • →Real world use case
Activities

Quiz

Lab: Federated Query with BigQuery

Objectives
  • Explain why BigQuery is a scalable data warehousing solution on Google Cloud.
  • Discuss the core concepts of BigQuery.
  • Understand BigLake's role in creating a unified lakehouse architecture and its integration with BigQuery for external data.
  • Learn how BigQuery natively interacts with Apache Iceberg tables via BigLake.
Topics covered
  • →BigQuery fundamentals
  • →Partitioning and clustering in BigQuery
  • →Introducing BigLake and external tables
Activities

Quiz

Lab: Querying External Data and Iceberg Tables

Objectives
  • Implement robust data governance and security practices across the unified data platform, including sensitive data protection and metadata management.
  • Explore advanced analytics and machine learning directly on lakehouse data.
Topics covered
  • →Data governance and security in a unified platform
  • →Demo: Data Loss Prevention
  • →Analytics and machine learning on the lakehouse
  • →Real-world lakehouse architectures and migration strategies
Activities

Quiz

Objectives
  • Reinforce the core principles of Google Cloud's data platform.
Topics covered
  • →Review
  • →Best practices
Activities

Lab: Getting Started with BigQuery ML

Lab: Vector Search with BigQuery

Course 3 : Build Batch Data Pipelines on Google Cloud

Objectives
  • Explain the critical role of a data engineer in developing and maintaining batch data pipelines.
  • Describe the core components and typical lifecycle of batch data pipelines from ingestion to downstream consumption.
  • Analyze common challenges in batch data processing, such as data volume, quality, complexity, and reliability, and identify key Google Cloud services that can address them.
Topics covered
  • →Batch data pipelines and their use cases
  • →Processing and common challenges
Activities

Quiz

Objectives
  • Design scalable batch data pipelines for high-volume data ingestion and transformation.
  • Optimize batch jobs for high throughput and cost-efficiency using various resource management and performance tuning techniques.
Topics covered
  • →Design batch pipelines
  • →Large scale data transformations
  • →Dataflow and Serverless for Apache Spark
  • →Data connections and orchestration
  • →Execute an Apache Spark pipeline
  • →Optimize batch pipeline performance
Activities

Quiz

Lab: Build a Simple Batch Data Pipeline with Serverless for Apache Spark (optional)

Lab: Build a Simple Batch Data Pipeline with Dataflow Job Builder UI (optional)

Objectives
  • Develop data validation rules and cleansing logic to ensure data quality within batch pipelines.
  • Implement strategies for managing schema evolution and performing data deduplication in large datasets.
Topics covered
  • →Batch data validation and cleansing
  • →Log and analyze errors
  • →Schema evolution for batch pipelines
  • →Data integrity and duplication
  • →Deduplication with Serverless for Apache Spark
  • →Deduplication with Dataflow
Activities

Lab: Validate Data Quality in a Batch Pipeline with Serverless for Apache Spark (optional)

Quiz

Objectives
  • Orchestrate complex batch data pipeline workflows for efficient scheduling and lineage tracking.
  • Implement robust error handling, monitoring, and observability for batch data pipelines.
Topics covered
  • →Orchestration for batch processing
  • →Cloud Composer
  • →Unified observability
  • →Alerts and troubleshooting
  • →Visual pipeline management
Activities

Lab: Building Batch Pipelines in Cloud Data Fusion

Quiz

Course 4 : Build Streaming Data Pipelines on Google Cloud

Objectives
  • Introduce the course learning objectives, and the scenario that will be used to bring hands on learning to building streaming data pipelines.
  • Describe the concept of streaming data pipelines, challenges associated with it, and the role of these pipelines within the data engineering process.
Topics covered
  • →Course learning objectives
  • →Course prerequisites
  • →The use case
  • →About the company
  • →The challenge
  • →The mission
Objectives
  • Understand various streaming use cases and their applications, including Streaming ETL, Streaming AI/ML, Streaming Application, and Reverse ETL.
  • Identify and describe common sample architectures for streaming data, including Streaming ETL, Streaming AI/ML, Streaming Application, and Reverse ETL.
Topics covered
  • →Introduction to streaming data pipelines on Google Cloud
  • →Streaming ETL
  • →Streaming AI/ML
  • →Streaming applications
  • →Reverse ETL
Activities

Quiz

Objectives
  • Pub/Sub and Managed Service for Apache Kafka: Define messaging concepts, know when to use Pub/Sub or Managed Service for Apache Kafka.
  • Dataflow: Describe the service and challenges with streaming data, build and deploy a streaming pipeline.
  • BigQuery: Explore various data ingestion methods, use BigQuery continuous queries, BigQuery ETL, and reverse ETL, configure Pub/Sub to BigQuery streaming, architecting BigQuery streaming pipelines.
  • Bigtable: Describe the big picture of data movement and interaction, establish a streaming pipeline from Dataflow to Bigtable, analyze the Bigtable continuous data stream for trends using BigQuery, synchronize the trends analysis back into the user-facing application.
Topics covered
  • →Understanding the products
  • →Architectural considerations for Pub/Sub and Managed Service for Apache Kafka
  • →Dataflow: The processing powerhouse
  • →BigQuery: The analytical engine
  • →Bigtable: The solution for operational data
Activities

Lab: Stream data with pipelines - Esports use case (optional)

Quiz

Lab: Use Apache Beam and Bigtable to enrich esports downloadable content (DLC) data

Quiz

Lab: Stream e-sports data with Pub/Sub and BigQuery

Quiz

Lab: Monitor e-sports chat with Streamlit

Quiz

Topics covered
  • →What you've accomplished
  • →Next steps

Quality Process

SFEIR Institute's commitment: an excellence approach to ensure the quality and success of all our training programs. Learn more about our quality approach

Teaching Methods Used
  • Lectures / Theoretical Slides — Presentation of concepts using visual aids (PowerPoint, PDF).
  • Technical Demonstration (Demos) — The instructor performs a task or procedure while students observe.
  • Guided Labs — Guided practical exercises on software, hardware, or technical environments.
  • Quiz / MCQ — Quick knowledge check (paper-based or digital via tools like Kahoot/Klaxoon).
Evaluation and Monitoring System

The achievement of training objectives is evaluated at multiple levels to ensure quality:

  • Continuous Knowledge Assessment : Verification of knowledge throughout the training via participatory methods (quizzes, practical exercises, case studies) under instructor supervision.
  • Progress Measurement : Comparative self-assessment system including an initial diagnostic to determine the starting level, followed by a final evaluation to validate skills development.
  • Quality Evaluation : End-of-session satisfaction questionnaire to measure the relevance and effectiveness of the training as perceived by participants.

Upcoming sessions

February 9, 2026
Distanciel • Français
Register
April 27, 2026
Distanciel • Français
Register
June 29, 2026
Distanciel • Français
Register
August 31, 2026
Distanciel • Français
Register
October 26, 2026
Distanciel • Français
Register
December 14, 2026
Distanciel • Français
Register

3,160€ excl. VAT

per learner