GCP200DE

Data Engineering on Google Cloud

Get hands-on experience designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, and analyze data. This course covers structured, unstructured, and streaming data.

Google Cloud
✓ Official training Google CloudLevel Intermediate⏱️ 4 days (28h)

What you will learn

  • Design and build data processing systems on Google Cloud.
  • Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
  • Derive business insights from extremely large datasets using BigQuery.
  • Leverage unstructured data using Spark and ML APIs on Dataproc.
  • Enable instant insights from streaming data.

Prerequisites

  • Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.
  • Basic proficiency with a common query language such as SQL.
  • Experience with data modeling and ETL (extract, transform, load) activities.
  • Experience developing applications using a common programming language such as Python.

Target audience

  • Data engineers, Database administrators, System administrators

Training Program

18 modules to master the fundamentals

Objectives

  • Explain the role of a data engineer.
  • Understand the differences between a data source and a data sink.
  • Explain the different types of data formats.
  • Explain the storage solution options on Google Cloud.
  • Learn about the metadata management options on Google Cloud.
  • Understand how to share datasets with ease using Analytics Hub.
  • Understand how to load data into BigQuery using the Google Cloud console and/or the gcloud CLI.

Topics covered

  • →The role of a data engineer
  • →Data sources versus data syncs
  • →Data formats
  • →Storage solution options on Google Cloud
  • →Metadata management options on Google Cloud
  • →Share datasets using Analytics Hub

Activities

Lab: Loading Data into BigQuery

Objectives

  • Explain the baseline Google Cloud data replication and migration architecture.
  • Understand the options and use cases for the gcloud command line tool.
  • Explain the functionality and use cases for the Storage Transfer Service.
  • Explain the functionality and use cases for the Transfer Appliance.
  • Understand the features and deployment of Datastream.

Topics covered

  • →Replication and migration architecture
  • →The gcloud command line tool
  • →Moving datasets
  • →Datastream

Activities

Lab: Datastream: PostgreSQL Replication to BigQuery

Objectives

  • Explain the baseline extract and load architecture diagram.
  • Understand the options of the bq command line tool.
  • Explain the functionality and use cases for the BigQuery Data Transfer Service.
  • Explain the functionality and use cases for BigLake as a non-extract-load pattern.

Topics covered

  • →Extract and load architecture
  • →The bq command line tool
  • →BigQuery Data Transfer Service
  • →BigLake

Activities

Lab: BigLake: Qwik Start

Objectives

  • Explain the baseline extract, load, and transform architecture diagram.
  • Understand a common ELT pipeline on Google Cloud.
  • Learn about BigQuery's SQL scripting and scheduling capabilities.
  • Explain the functionality and use cases for Dataform.

Topics covered

  • →Extract, load, and transform (ELT) architecture
  • →SQL scripting and scheduling with BigQuery
  • →Dataform

Activities

Lab: Create and Execute a SQL Workflow in Dataform

Objectives

  • Explain the baseline extract, transform, and load architecture diagram.
  • Learn about the GUI tools on Google Cloud used for ETL data pipelines.
  • Explain batch data processing using Dataproc.
  • Learn to use Dataproc Serverless for Spark for ETL.
  • Explain streaming data processing options.
  • Explain the role Bigtable plays in data pipelines.

Topics covered

  • →Extract, transform, and load (ETL) architecture
  • →Google Cloud GUI tools for ETL data pipelines
  • →Batch data processing using Dataproc
  • →Streaming data processing options
  • →Bigtable and data pipelines

Activities

Lab: Use Dataproc Serverless for Spark to Load BigQuery

Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow

Objectives

  • Explain the automation patterns and options available for pipelines.
  • Learn about Cloud Scheduler and workflows.
  • Learn about Cloud Composer.
  • Learn about Cloud Run functions.
  • Explain the functionality and automation use cases for Eventarc.

Topics covered

  • →Automation patterns and options for pipelines
  • →Cloud Scheduler and Workflows
  • →Cloud Composer
  • →Cloud Run functions
  • →Eventarc

Activities

Lab: Use Cloud Run Functions to Load BigQuery

Objectives

  • Discuss the challenges of data engineering, and how building data pipelines in the cloud helps to address these.
  • Review and understand the purpose of a data lake versus a data warehouse, and when to use which.

Topics covered

  • →Data engineer's role
  • →Data engineering challenges
  • →Introduction to BigQuery
  • →Data lakes and data warehouses
  • →Transactional databases versus data warehouses
  • →Effective partnership with other data teams
  • →Management of data access and governance
  • →Building of production-ready pipelines
  • →Google Cloud customer case study

Activities

Lab: Using BigQuery to Do Analysis

Objectives

  • Discuss why Cloud Storage is a great option for building a data lake on Google Cloud.
  • Explain how to use Cloud SQL for a relational data lake.

Topics covered

  • →Introduction to data lakes
  • →Data storage and ETL options on Google Cloud
  • →Building of a data lake using Cloud Storage
  • →Secure Cloud Storage
  • →Store all sorts of data types
  • →Cloud SQL as your OLTP system

Activities

Lab: Loading Taxi Data into Cloud SQL

Objectives

  • Discuss requirements of a modern warehouse.
  • Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
  • Discuss the core concepts of BigQuery and review options of loading data into BigQuery.

Topics covered

  • →The modern data warehouse
  • →Introduction to BigQuery
  • →Get started with BigQuery
  • →Loading of data into BigQuery
  • →Exploration of schemas
  • →Schema design
  • →Nested and repeated fields
  • →Optimization with partitioning and clustering

Activities

Lab: Working with JSON and Array Data in BigQuery

Lab: Partitioned Tables in BigQuery

Objectives

  • Review different methods of loading data into your data lakes and warehouses: EL, ELT, and ETL.

Topics covered

  • →EL, ELT, ETL
  • →Quality considerations
  • →Ways of executing operations in BigQuery
  • →Shortcomings
  • →ETL to solve data quality issues

Objectives

  • Review the Hadoop ecosystem.
  • Discuss how to lift and shift your existing Hadoop workloads to the cloud using Dataproc.
  • Explain when you would use Cloud Storage instead of HDFS storage.
  • Explain how to optimize Dataproc jobs.

Topics covered

  • →The Hadoop ecosystem
  • →Run Hadoop on Dataproc
  • →Cloud Storage instead of HDFS
  • →Optimize Dataproc

Activities

Lab: Running Apache Spark Jobs on Dataproc

Objectives

  • Identify features customers value in Dataflow.
  • Discuss core concepts in Dataflow.
  • Review the use of Dataflow templates and SQL.
  • Write a simple Dataflow pipeline and run it both locally and on the cloud.
  • Identify Map and Reduce operations, execute the pipeline, and use command line parameters.
  • Read data from BigQuery into Dataflow and use the output of a pipeline as a side-input to another pipeline.

Topics covered

  • →Introduction to Dataflow
  • →Reasons why customers value Dataflow
  • →Dataflow pipelines
  • →Aggregating with GroupByKey and Combine
  • →Side inputs and windows
  • →Dataflow templates

Activities

Lab: A Simple Dataflow Pipeline (Python/Java)

Lab: MapReduce in Beam (Python/Java)

Lab: Side Inputs (Python/Java)

Objectives

  • Discuss how to manage your data pipelines with Cloud Data Fusion and Cloud Composer.
  • Summarize how Cloud Data Fusion allows data analysts and ETL developers to wrangle data and build pipelines in a visual way.
  • Describe how Cloud Composer can help to orchestrate the work across multiple Google Cloud services.

Topics covered

  • →Build batch data pipelines visually with Cloud Data Fusion (Components, UI overview, Building a pipeline, Exploring data using Wrangler)
  • →Orchestrate work between Google Cloud services with Cloud Composer (Apache Airflow environment, DAGs and operators, Workflow scheduling, Monitoring and logging)

Activities

Lab: Building and Executing a Pipeline Graph in Data Fusion

Lab: An Introduction to Cloud Composer

Objectives

  • Explain streaming data processing.
  • Identify the Google Cloud products and tools that can help address streaming data challenges.

Topics covered

  • →Process streaming data

Objectives

  • Describe the Pub/Sub service.
  • Explain how Pub/Sub works.
  • Simulate real-time streaming sensor data using Pub/Sub.

Topics covered

  • →Introduction to Pub/Sub
  • →Pub/Sub push versus pull
  • →Publishing with Pub/Sub code

Activities

Lab: Publish Streaming Data into Pub/Sub

Objectives

  • Describe the Dataflow service.
  • Build a stream processing pipeline for live traffic data.
  • Demonstrate how to handle late data using watermarks, triggers, and accumulation.

Topics covered

  • →Streaming data challenges
  • →Dataflow windowing

Activities

Lab: Streaming Data Pipelines

Objectives

  • Describe how to perform ad-hoc analysis on streaming data using BigQuery and dashboards.
  • Discuss Bigtable as a low-latency solution.
  • Describe how to architect for Bigtable and how to ingest data into Bigtable.
  • Highlight performance considerations for the relevant services.

Topics covered

  • →Streaming into BigQuery and visualizing results
  • →High-throughput streaming with Bigtable
  • →Optimizing Bigtable performance

Activities

Lab: Streaming Analytics and Dashboards

Lab: Generate Personalized Email Content with BigQuery Continuous Queries and Gemini

Lab: Streaming Data Pipelines into Bigtable

Objectives

  • Review some of BigQuery's advanced analysis capabilities.
  • Discuss ways to improve query performance.

Topics covered

  • →Analytic window functions
  • →GIS functions
  • →Performance considerations

Activities

Lab: Optimizing Your BigQuery Queries for Performance

Quality Process

SFEIR Institute's commitment: an excellence approach to ensure the quality and success of all our training programs. Learn more about our quality approach

Teaching Methods Used
  • Lectures / Theoretical Slides — Presentation of concepts using visual aids (PowerPoint, PDF).
  • Technical Demonstration (Demos) — The instructor performs a task or procedure while students observe.
  • Guided Labs — Guided practical exercises on software, hardware, or technical environments.
  • Quiz / MCQ — Quick knowledge check (paper-based or digital via tools like Kahoot/Klaxoon).
Evaluation and Monitoring System

The achievement of training objectives is evaluated at multiple levels to ensure quality:

  • Continuous Knowledge Assessment : Verification of knowledge throughout the training via participatory methods (quizzes, practical exercises, case studies) under instructor supervision.
  • Progress Measurement : Comparative self-assessment system including an initial diagnostic to determine the starting level, followed by a final evaluation to validate skills development.
  • Quality Evaluation : End-of-session satisfaction questionnaire to measure the relevance and effectiveness of the training as perceived by participants.

Upcoming sessions

February 9, 2026
Distanciel • Français
Register
April 27, 2026
Distanciel • Français
Register
June 29, 2026
Distanciel • Français
Register
August 31, 2026
Distanciel • Français
Register
October 26, 2026
Distanciel • Français
Register
December 14, 2026
Distanciel • Français
Register

2,800€ excl. VAT

per learner