GCP200DE

Data Engineering on Google Cloud

Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.

This course is comprised of the following four courses:

  • Introduction to Data Engineering on Google Cloud
  • Build Data Lakes and Data Warehouses with Google Cloud
  • Build Batch Data Pipelines on Google Cloud
  • Build Streaming Data Pipelines on Google Cloud
Google Cloud
✓ Official training Google CloudLevel Intermediate⏱️ 4 days (28h)

What you will learn

  • Design scalable data processing systems in Google Cloud.
  • Differentiate data architectures and implement data lakehouse and pipeline concepts.
  • Build and manage robust streaming and batch data pipelines.
  • Utilize AI/ML tools to optimize performance and gain process and data insights.

Prerequisites

  • Understanding of data engineering principles, including ETL/ELT processes, data modeling, and common data formats (Avro, Parquet, JSON).
  • Familiarity with data architecture concepts, specifically Data Warehouses and Data Lakes.
  • Proficiency in SQL for data querying.
  • Proficiency in a common programming language (Python recommended).
  • Familiarity with using Command Line Interfaces (CLI).
  • Familiarity with core Google Cloud concepts and services (Compute, Storage, and Identity management).

Target audience

  • Data Engineers, Data Analysts, Data Architects

Training Program

19 modules to master the fundamentals

Course 1 : Introduction to Data Engineering on Google Cloud

Objectives
  • Explain the role of a data engineer.
  • Understand the differences between a data source and a data sink.
  • Explain the different types of data formats.
  • Explain the storage solution options on Google Cloud.
  • Learn about the metadata management options on Google Cloud.
  • Understand how to share datasets with ease using Analytics Hub.
  • Understand how to load data into BigQuery using the Google Cloud console or the gcloud CLI.
Topics covered
  • →The role of a data engineer
  • →Data sources versus data sinks
  • →Data formats
  • →Storage solution options on Google Cloud
  • →Metadata management options on Google Cloud
  • →Sharing datasets using Analytics Hub
Activities

Lab: Loading Data into BigQuery

Quiz

Objectives
  • Explain the baseline Google Cloud data replication and migration architecture.
  • Understand the options and use cases for the gcloud command-line tool.
  • Explain the functionality and use cases for Storage Transfer Service.
  • Explain the functionality and use cases for Transfer Appliance.
  • Understand the features and deployment of Datastream.
Topics covered
  • →Replication and migration architecture
  • →The gcloud command-line tool
  • →Moving datasets
  • →Datastream
Objectives
  • Explain the baseline extract and load architecture diagram.
  • Understand the options of the bq command-line tool.
  • Explain the functionality and use cases for BigQuery Data Transfer Service.
  • Explain the functionality and use cases for BigLake as a non-extract-load pattern.
Topics covered
  • →Extract and load architecture
  • →The bq command-line tool
  • →BigQuery Data Transfer Service
  • →BigLake
Activities

Lab: BigLake: Qwik Start

Quiz

Objectives
  • Explain the baseline extract, load, and transform architecture diagram.
  • Understand a common ELT pipeline on Google Cloud.
  • Learn about BigQuery's SQL scripting and scheduling capabilities.
  • Explain the functionality and use cases for Dataform.
Topics covered
  • →Extract, load, and transform (ELT) architecture
  • →SQL scripting and scheduling with BigQuery
  • →Dataform
Activities

Lab: Create and Execute a SQL Workflow in Dataform

Quiz

Objectives
  • Explain the baseline extract, transform, and load architecture diagram.
  • Learn about the GUI tools on Google Cloud used for ETL data pipelines.
  • Explain batch data processing using Dataproc.
  • Learn how to use Dataproc Serverless for Spark for ETL.
  • Explain streaming data processing options.
  • Explain the role Bigtable plays in data pipelines.
Topics covered
  • →Extract, transform, and load (ETL) architecture
  • →Google Cloud GUI tools for ETL data pipelines
  • →Batch data processing using Dataproc
  • →Streaming data processing options
  • →Bigtable and data pipelines
Activities

Lab: Use Dataproc Serverless for Spark to Load BigQuery (optional)

Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow

Quiz

Objectives
  • Explain the automation patterns and options available for pipelines.
  • Learn about Cloud Scheduler and Workflows.
  • Learn about Cloud Composer.
  • Learn about Cloud Run functions.
  • Explain the functionality and automation use cases for Eventarc.
Topics covered
  • →Automation patterns and options for pipelines
  • →Cloud Scheduler and Workflows
  • →Cloud Composer
  • →Cloud Run Functions
  • →Eventarc
Activities

Lab: Use Cloud Run Functions to Load BigQuery (optional)

Quiz

Course 2 : Build Data Lakes and Data Warehouses with Google Cloud

Objectives
  • Compare and contrast data lake, data warehouse, and data lakehouse architectures.
  • Evaluate the benefits of the lakehouse approach.
Topics covered
  • →The classics: Data lakes and data warehouses
  • →The modern approach: Data lakehouse
  • →Choosing the right architecture
Activities

Quiz

Objectives
  • Discuss data storage options, including Cloud Storage for files, open table formats like Apache Iceberg, BigQuery for analytic data, and AlloyDB for operational data.
  • Understand the role of AlloyDB for operational data use cases.
Topics covered
  • →Building a data lake foundation
  • →Introduction to Apache Iceberg open table format
  • →BigQuery as the central processing engine
  • →Combining operational data in AlloyDB
  • →Combining operational and analytical data with federated queries
  • →Real world use case
Activities

Quiz

Lab: Federated Query with BigQuery

Objectives
  • Explain why BigQuery is a scalable data warehousing solution on Google Cloud.
  • Discuss the core concepts of BigQuery.
  • Understand BigLake's role in creating a unified lakehouse architecture and its integration with BigQuery for external data.
  • Learn how BigQuery natively interacts with Apache Iceberg tables via BigLake.
Topics covered
  • →BigQuery fundamentals
  • →Partitioning and clustering in BigQuery
  • →Introducing BigLake and external tables
Activities

Quiz

Lab: Querying External Data and Iceberg Tables

Objectives
  • Implement robust data governance and security practices across the unified data platform, including sensitive data protection and metadata management.
  • Explore advanced analytics and machine learning directly on lakehouse data.
Topics covered
  • →Data governance and security in a unified platform
  • →Demo: Data Loss Prevention
  • →Analytics and machine learning on the lakehouse
  • →Real-world lakehouse architectures and migration strategies
Activities

Quiz

Objectives
  • Reinforce the core principles of Google Cloud's data platform.
Topics covered
  • →Review
  • →Best practices
Activities

Lab: Getting Started with BigQuery ML

Lab: Vector Search with BigQuery

Course 3 : Build Batch Data Pipelines on Google Cloud

Objectives
  • Explain the critical role of a data engineer in developing and maintaining batch data pipelines.
  • Describe the core components and typical lifecycle of batch data pipelines from ingestion to downstream consumption.
  • Analyze common challenges in batch data processing, such as data volume, quality, complexity, and reliability, and identify key Google Cloud services that can address them.
Topics covered
  • →Batch data pipelines and their use cases
  • →Processing and common challenges
Activities

Quiz

Objectives
  • Design scalable batch data pipelines for high-volume data ingestion and transformation.
  • Optimize batch jobs for high throughput and cost-efficiency using various resource management and performance tuning techniques.
Topics covered
  • →Design batch pipelines
  • →Large scale data transformations
  • →Dataflow and Serverless for Apache Spark
  • →Data connections and orchestration
  • →Execute an Apache Spark pipeline
  • →Optimize batch pipeline performance
Activities

Quiz

Lab: Build a Simple Batch Data Pipeline with Serverless for Apache Spark (optional)

Lab: Build a Simple Batch Data Pipeline with Dataflow Job Builder UI (optional)

Objectives
  • Develop data validation rules and cleansing logic to ensure data quality within batch pipelines.
  • Implement strategies for managing schema evolution and performing data deduplication in large datasets.
Topics covered
  • →Batch data validation and cleansing
  • →Log and analyze errors
  • →Schema evolution for batch pipelines
  • →Data integrity and duplication
  • →Deduplication with Serverless for Apache Spark
  • →Deduplication with Dataflow
Activities

Lab: Validate Data Quality in a Batch Pipeline with Serverless for Apache Spark (optional)

Quiz

Objectives
  • Orchestrate complex batch data pipeline workflows for efficient scheduling and lineage tracking.
  • Implement robust error handling, monitoring, and observability for batch data pipelines.
Topics covered
  • →Orchestration for batch processing
  • →Cloud Composer
  • →Unified observability
  • →Alerts and troubleshooting
  • →Visual pipeline management
Activities

Lab: Building Batch Pipelines in Cloud Data Fusion

Quiz

Course 4 : Build Streaming Data Pipelines on Google Cloud

Objectives
  • Introduce the course learning objectives, and the scenario that will be used to bring hands on learning to building streaming data pipelines.
  • Describe the concept of streaming data pipelines, challenges associated with it, and the role of these pipelines within the data engineering process.
Topics covered
  • →Course learning objectives
  • →Course prerequisites
  • →The use case
  • →About the company
  • →The challenge
  • →The mission
Objectives
  • Understand various streaming use cases and their applications, including Streaming ETL, Streaming AI/ML, Streaming Application, and Reverse ETL.
  • Identify and describe common sample architectures for streaming data, including Streaming ETL, Streaming AI/ML, Streaming Application, and Reverse ETL.
Topics covered
  • →Introduction to streaming data pipelines on Google Cloud
  • →Streaming ETL
  • →Streaming AI/ML
  • →Streaming applications
  • →Reverse ETL
Activities

Quiz

Objectives
  • Pub/Sub and Managed Service for Apache Kafka: Define messaging concepts, know when to use Pub/Sub or Managed Service for Apache Kafka.
  • Dataflow: Describe the service and challenges with streaming data, build and deploy a streaming pipeline.
  • BigQuery: Explore various data ingestion methods, use BigQuery continuous queries, BigQuery ETL, and reverse ETL, configure Pub/Sub to BigQuery streaming, architecting BigQuery streaming pipelines.
  • Bigtable: Describe the big picture of data movement and interaction, establish a streaming pipeline from Dataflow to Bigtable, analyze the Bigtable continuous data stream for trends using BigQuery, synchronize the trends analysis back into the user-facing application.
Topics covered
  • →Understanding the products
  • →Architectural considerations for Pub/Sub and Managed Service for Apache Kafka
  • →Dataflow: The processing powerhouse
  • →BigQuery: The analytical engine
  • →Bigtable: The solution for operational data
Activities

Lab: Stream data with pipelines - Esports use case (optional)

Quiz

Lab: Use Apache Beam and Bigtable to enrich esports downloadable content (DLC) data

Quiz

Lab: Stream e-sports data with Pub/Sub and BigQuery

Quiz

Lab: Monitor e-sports chat with Streamlit

Quiz

Topics covered
  • →What you've accomplished
  • →Next steps

Related Trainings

SFEIR Institute
Best

dbt

Learn to transform your data with dbt, the leading tool in the Modern Data Stack. You'll start by understanding the evolution of data architectures and the difference between ETL and ELT. You'll install dbt, create your first project and connect it to your data sources. Then you'll learn to build structured data models, choose the right materialization options (table, view, incremental) and organize your metadata with tags. You'll discover how to reference your sources and manage dependencies between models. You'll explore advanced features: seeds to initialize reference data, snapshots to track history and manage slowly changing dimensions, Jinja macros and variables to automate your transformations. Finally, you'll implement automated tests to ensure data quality, document your models with lineage, and discover packages from the dbt community. Hands-on training with 60% labs.

2 d
Fundamental
Google Cloud

Introduction to Data Analytics on Google Cloud

This course is an introduction to data analytics on Google Cloud. It is designed for learners who have no prior experience with data analytics or Google Cloud. The course covers the basics of data analysis, including collection, storage, exploration, visualization, and sharing. It also introduces learners to Google Cloud's data analytics tools and services. Through video lectures, demos, quizzes, and hands-on labs, the course demonstrates how to go from raw data to impactful visualizations and dashboards.

1 d
Fundamental

Upcoming sessions

April 27, 2026
Distanciel • Français
Register
June 29, 2026
Distanciel • Français
Register
August 31, 2026
Distanciel • Français
Register
October 26, 2026
Distanciel • Français
Register
December 14, 2026
Distanciel • Français
Register

Quality Process

SFEIR Institute's commitment: an excellence approach to ensure the quality and success of all our training programs. Learn more about our quality approach

Teaching Methods Used
  • Lectures / Theoretical Slides — Presentation of concepts using visual aids (PowerPoint, PDF).
  • Technical Demonstration (Demos) — The instructor performs a task or procedure while students observe.
  • Guided Labs — Guided practical exercises on software, hardware, or technical environments.
  • Quiz / MCQ — Quick knowledge check (paper-based or digital via tools like Kahoot/Klaxoon).
Evaluation and Monitoring System

The achievement of training objectives is evaluated at multiple levels to ensure quality:

  • Continuous Knowledge Assessment : Verification of knowledge throughout the training via participatory methods (quizzes, practical exercises, case studies) under instructor supervision.
  • Progress Measurement : Comparative self-assessment system including an initial diagnostic to determine the starting level, followed by a final evaluation to validate skills development.
  • Quality Evaluation : End-of-session satisfaction questionnaire to measure the relevance and effectiveness of the training as perceived by participants.

3,160€ excl. VAT

per learner