GCP200DE

Data Engineering on Google Cloud

Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.

This course is comprised of the following four courses:

  • Introduction to Data Engineering on Google Cloud
  • Build Data Lakes and Data Warehouses with Google Cloud
  • Build Batch Data Pipelines on Google Cloud
  • Build Streaming Data Pipelines on Google Cloud
Google Cloud
βœ“ Official training Google CloudLevel Intermediate⏱️ 4 days (28h)

What you will learn

  • Design scalable data processing systems in Google Cloud.
  • Differentiate data architectures and implement data lakehouse and pipeline concepts.
  • Build and manage robust streaming and batch data pipelines.
  • Utilize AI/ML tools to optimize performance and gain process and data insights.

Prerequisites

  • Understanding of data engineering principles, including ETL/ELT processes, data modeling, and common data formats (Avro, Parquet, JSON).
  • Familiarity with data architecture concepts, specifically Data Warehouses and Data Lakes.
  • Proficiency in SQL for data querying.
  • Proficiency in a common programming language (Python recommended).
  • Familiarity with using Command Line Interfaces (CLI).
  • Familiarity with core Google Cloud concepts and services (Compute, Storage, and Identity management).

Target audience

  • Data Engineers, Data Analysts, Data Architects

Training Program

19 modules to master the fundamentals

Course 1 : Introduction to Data Engineering on Google Cloud

Objectives
  • Explain the role of a data engineer.
  • Understand the differences between a data source and a data sink.
  • Explain the different types of data formats.
  • Explain the storage solution options on Google Cloud.
  • Learn about the metadata management options on Google Cloud.
  • Understand how to share datasets with ease using Analytics Hub.
  • Understand how to load data into BigQuery using the Google Cloud console or the gcloud CLI.
Topics covered
  • β†’The role of a data engineer
  • β†’Data sources versus data sinks
  • β†’Data formats
  • β†’Storage solution options on Google Cloud
  • β†’Metadata management options on Google Cloud
  • β†’Sharing datasets using Analytics Hub
Activities

Lab: Loading Data into BigQuery

Quiz

Objectives
  • Explain the baseline Google Cloud data replication and migration architecture.
  • Understand the options and use cases for the gcloud command-line tool.
  • Explain the functionality and use cases for Storage Transfer Service.
  • Explain the functionality and use cases for Transfer Appliance.
  • Understand the features and deployment of Datastream.
Topics covered
  • β†’Replication and migration architecture
  • β†’The gcloud command-line tool
  • β†’Moving datasets
  • β†’Datastream
Objectives
  • Explain the baseline extract and load architecture diagram.
  • Understand the options of the bq command-line tool.
  • Explain the functionality and use cases for BigQuery Data Transfer Service.
  • Explain the functionality and use cases for BigLake as a non-extract-load pattern.
Topics covered
  • β†’Extract and load architecture
  • β†’The bq command-line tool
  • β†’BigQuery Data Transfer Service
  • β†’BigLake
Activities

Lab: BigLake: Qwik Start

Quiz

Objectives
  • Explain the baseline extract, load, and transform architecture diagram.
  • Understand a common ELT pipeline on Google Cloud.
  • Learn about BigQuery's SQL scripting and scheduling capabilities.
  • Explain the functionality and use cases for Dataform.
Topics covered
  • β†’Extract, load, and transform (ELT) architecture
  • β†’SQL scripting and scheduling with BigQuery
  • β†’Dataform
Activities

Lab: Create and Execute a SQL Workflow in Dataform

Quiz

Objectives
  • Explain the baseline extract, transform, and load architecture diagram.
  • Learn about the GUI tools on Google Cloud used for ETL data pipelines.
  • Explain batch data processing using Dataproc.
  • Learn how to use Dataproc Serverless for Spark for ETL.
  • Explain streaming data processing options.
  • Explain the role Bigtable plays in data pipelines.
Topics covered
  • β†’Extract, transform, and load (ETL) architecture
  • β†’Google Cloud GUI tools for ETL data pipelines
  • β†’Batch data processing using Dataproc
  • β†’Streaming data processing options
  • β†’Bigtable and data pipelines
Activities

Lab: Use Dataproc Serverless for Spark to Load BigQuery (optional)

Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow

Quiz

Objectives
  • Explain the automation patterns and options available for pipelines.
  • Learn about Cloud Scheduler and Workflows.
  • Learn about Cloud Composer.
  • Learn about Cloud Run functions.
  • Explain the functionality and automation use cases for Eventarc.
Topics covered
  • β†’Automation patterns and options for pipelines
  • β†’Cloud Scheduler and Workflows
  • β†’Cloud Composer
  • β†’Cloud Run Functions
  • β†’Eventarc
Activities

Lab: Use Cloud Run Functions to Load BigQuery (optional)

Quiz

Course 2 : Build Data Lakes and Data Warehouses with Google Cloud

Objectives
  • Compare and contrast data lake, data warehouse, and data lakehouse architectures.
  • Evaluate the benefits of the lakehouse approach.
Topics covered
  • β†’The classics: Data lakes and data warehouses
  • β†’The modern approach: Data lakehouse
  • β†’Choosing the right architecture
Activities

Quiz

Objectives
  • Discuss data storage options, including Cloud Storage for files, open table formats like Apache Iceberg, BigQuery for analytic data, and AlloyDB for operational data.
  • Understand the role of AlloyDB for operational data use cases.
Topics covered
  • β†’Building a data lake foundation
  • β†’Introduction to Apache Iceberg open table format
  • β†’BigQuery as the central processing engine
  • β†’Combining operational data in AlloyDB
  • β†’Combining operational and analytical data with federated queries
  • β†’Real world use case
Activities

Quiz

Lab: Federated Query with BigQuery

Objectives
  • Explain why BigQuery is a scalable data warehousing solution on Google Cloud.
  • Discuss the core concepts of BigQuery.
  • Understand BigLake's role in creating a unified lakehouse architecture and its integration with BigQuery for external data.
  • Learn how BigQuery natively interacts with Apache Iceberg tables via BigLake.
Topics covered
  • β†’BigQuery fundamentals
  • β†’Partitioning and clustering in BigQuery
  • β†’Introducing BigLake and external tables
Activities

Quiz

Lab: Querying External Data and Iceberg Tables

Objectives
  • Implement robust data governance and security practices across the unified data platform, including sensitive data protection and metadata management.
  • Explore advanced analytics and machine learning directly on lakehouse data.
Topics covered
  • β†’Data governance and security in a unified platform
  • β†’Demo: Data Loss Prevention
  • β†’Analytics and machine learning on the lakehouse
  • β†’Real-world lakehouse architectures and migration strategies
Activities

Quiz

Objectives
  • Reinforce the core principles of Google Cloud's data platform.
Topics covered
  • β†’Review
  • β†’Best practices
Activities

Lab: Getting Started with BigQuery ML

Lab: Vector Search with BigQuery

Course 3 : Build Batch Data Pipelines on Google Cloud

Objectives
  • Explain the critical role of a data engineer in developing and maintaining batch data pipelines.
  • Describe the core components and typical lifecycle of batch data pipelines from ingestion to downstream consumption.
  • Analyze common challenges in batch data processing, such as data volume, quality, complexity, and reliability, and identify key Google Cloud services that can address them.
Topics covered
  • β†’Batch data pipelines and their use cases
  • β†’Processing and common challenges
Activities

Quiz

Objectives
  • Design scalable batch data pipelines for high-volume data ingestion and transformation.
  • Optimize batch jobs for high throughput and cost-efficiency using various resource management and performance tuning techniques.
Topics covered
  • β†’Design batch pipelines
  • β†’Large scale data transformations
  • β†’Dataflow and Serverless for Apache Spark
  • β†’Data connections and orchestration
  • β†’Execute an Apache Spark pipeline
  • β†’Optimize batch pipeline performance
Activities

Quiz

Lab: Build a Simple Batch Data Pipeline with Serverless for Apache Spark (optional)

Lab: Build a Simple Batch Data Pipeline with Dataflow Job Builder UI (optional)

Objectives
  • Develop data validation rules and cleansing logic to ensure data quality within batch pipelines.
  • Implement strategies for managing schema evolution and performing data deduplication in large datasets.
Topics covered
  • β†’Batch data validation and cleansing
  • β†’Log and analyze errors
  • β†’Schema evolution for batch pipelines
  • β†’Data integrity and duplication
  • β†’Deduplication with Serverless for Apache Spark
  • β†’Deduplication with Dataflow
Activities

Lab: Validate Data Quality in a Batch Pipeline with Serverless for Apache Spark (optional)

Quiz

Objectives
  • Orchestrate complex batch data pipeline workflows for efficient scheduling and lineage tracking.
  • Implement robust error handling, monitoring, and observability for batch data pipelines.
Topics covered
  • β†’Orchestration for batch processing
  • β†’Cloud Composer
  • β†’Unified observability
  • β†’Alerts and troubleshooting
  • β†’Visual pipeline management
Activities

Lab: Building Batch Pipelines in Cloud Data Fusion

Quiz

Course 4 : Build Streaming Data Pipelines on Google Cloud

Objectives
  • Introduce the course learning objectives, and the scenario that will be used to bring hands on learning to building streaming data pipelines.
  • Describe the concept of streaming data pipelines, challenges associated with it, and the role of these pipelines within the data engineering process.
Topics covered
  • β†’Course learning objectives
  • β†’Course prerequisites
  • β†’The use case
  • β†’About the company
  • β†’The challenge
  • β†’The mission
Objectives
  • Understand various streaming use cases and their applications, including Streaming ETL, Streaming AI/ML, Streaming Application, and Reverse ETL.
  • Identify and describe common sample architectures for streaming data, including Streaming ETL, Streaming AI/ML, Streaming Application, and Reverse ETL.
Topics covered
  • β†’Introduction to streaming data pipelines on Google Cloud
  • β†’Streaming ETL
  • β†’Streaming AI/ML
  • β†’Streaming applications
  • β†’Reverse ETL
Activities

Quiz

Objectives
  • Pub/Sub and Managed Service for Apache Kafka: Define messaging concepts, know when to use Pub/Sub or Managed Service for Apache Kafka.
  • Dataflow: Describe the service and challenges with streaming data, build and deploy a streaming pipeline.
  • BigQuery: Explore various data ingestion methods, use BigQuery continuous queries, BigQuery ETL, and reverse ETL, configure Pub/Sub to BigQuery streaming, architecting BigQuery streaming pipelines.
  • Bigtable: Describe the big picture of data movement and interaction, establish a streaming pipeline from Dataflow to Bigtable, analyze the Bigtable continuous data stream for trends using BigQuery, synchronize the trends analysis back into the user-facing application.
Topics covered
  • β†’Understanding the products
  • β†’Architectural considerations for Pub/Sub and Managed Service for Apache Kafka
  • β†’Dataflow: The processing powerhouse
  • β†’BigQuery: The analytical engine
  • β†’Bigtable: The solution for operational data
Activities

Lab: Stream data with pipelines - Esports use case (optional)

Quiz

Lab: Use Apache Beam and Bigtable to enrich esports downloadable content (DLC) data

Quiz

Lab: Stream e-sports data with Pub/Sub and BigQuery

Quiz

Lab: Monitor e-sports chat with Streamlit

Quiz

Topics covered
  • β†’What you've accomplished
  • β†’Next steps

Quality Process

SFEIR Institute's commitment: an excellence approach to ensure the quality and success of all our training programs. Learn more about our quality approach

Teaching Methods Used
  • Lectures / Theoretical Slides β€” Presentation of concepts using visual aids (PowerPoint, PDF).
  • Technical Demonstration (Demos) β€” The instructor performs a task or procedure while students observe.
  • Guided Labs β€” Guided practical exercises on software, hardware, or technical environments.
  • Quiz / MCQ β€” Quick knowledge check (paper-based or digital via tools like Kahoot/Klaxoon).
Evaluation and Monitoring System

The achievement of training objectives is evaluated at multiple levels to ensure quality:

  • Continuous Knowledge Assessment : Verification of knowledge throughout the training via participatory methods (quizzes, practical exercises, case studies) under instructor supervision.
  • Progress Measurement : Comparative self-assessment system including an initial diagnostic to determine the starting level, followed by a final evaluation to validate skills development.
  • Quality Evaluation : End-of-session satisfaction questionnaire to measure the relevance and effectiveness of the training as perceived by participants.

Upcoming sessions

April 27, 2026
Distanciel β€’ FranΓ§ais
Register
June 29, 2026
Distanciel β€’ FranΓ§ais
Register
August 31, 2026
Distanciel β€’ FranΓ§ais
Register
October 26, 2026
Distanciel β€’ FranΓ§ais
Register
December 14, 2026
Distanciel β€’ FranΓ§ais
Register

3,160€ excl. VAT

per learner