GCP200CDF

Data Integration with Cloud Data Fusion

This 2-day course introduces learners to Google Cloud's data integration capability using Cloud Data Fusion. In this course, we discuss challenges with data integration and the need for a data integration platform (middleware). We then discuss how Cloud Data Fusion can help to effectively integrate data from a variety of sources and formats and generate insights. We take a look at Cloud Data Fusion's main components and how they work, how to process batch data and real time streaming data with visual pipeline design, rich tracking of metadata and data lineage, and how to deploy data pipelines on various execution engines.

✓ Official training Google CloudLevel Intermediate⏱️ 2 days (14h)

What you will learn

Identify the need of data integration
Understand the capabilities Cloud Data Fusion provides as a data integration platform
Identify use cases for possible implementation with Cloud Data Fusion
List the core components of Cloud Data Fusion
Design and execute batch and real time data processing pipelines
Work with Wrangler to build data transformations
Use connectors to integrate data from various sources and formats
Configure execution environment; Monitor and Troubleshoot pipeline execution
Understand the relationship between metadata and data lineage

Prerequisites

Completed "Introduction to Data Engineering"

Target audience

Data Engineer, Data Analysts

Training Program

9 modules to master the fundamentals

Objectives

Introduce the course objectives

Topics covered

→Course Introduction

Objectives

Understand the need for data integration
List the situations/cases where data integration can help businesses
List the available data integration platforms and tools
Identify the challenges with data integration
Understand the use of Cloud Data Fusion as a data integration platform
Create a Cloud Data Fusion instance
Familiarize with core framework and major components in Cloud Data Fusion

Topics covered

→Data integration: what, why, challenges
→Data integration tools used in industry
→User personas
→Introduction to Cloud Data Fusion
→Data integration critical capabilities
→Cloud Data Fusion UI components

Activities

Graded lab

quiz

discussion activity

Objectives

Understand Cloud Data Fusion architecture
Define what a data pipeline is
Understand the DAG representation of a data pipeline
Learn to use Pipeline Studio and its components
Design a simple pipeline using Pipeline Studio
Deploy and execute a pipeline

Topics covered

→Cloud Data Fusion architecture
→Core concepts
→Data pipelines and directed acyclic graphs (DAG)
→Pipeline Lifecycle
→Designing pipelines in Pipeline Studio

Activities

Graded lab and quiz

Objectives

Perform branching, merging, and join operations
Execute pipeline with runtime arguments using macros
Work with error handlers
Execute pre- and post-pipeline executions with help of actions and notifications
Schedule pipelines for execution
Import and export existing pipelines

Topics covered

→Branching, Merging and Joining
→Actions and Notifications
→Error handling and Macros
→Pipeline Configurations, Scheduling, Import and Export

Activities

Graded labs and quiz

Objectives

Understand the composition of an execution environment
Configure your pipeline's execution environment, logging, and metrics. Understand concepts like compute profile and provisioner
Create a compute profile
Create pipeline alerts
Monitor the pipeline under execution

Topics covered

→Schedules and triggers
→Execution environment: Compute profile and provisioners
→Monitoring pipelines

Activities

Quiz

Objectives

Understand the use of Wrangler and its main components
Transform data using Wrangler UI
Transform data using directives/CLI methods
Create and use user-defined directives

Topics covered

→Wrangler
→Directives
→User-defined directives

Activities

Graded lab and quiz

Objectives

Understand the data integration architecture
List various connectors
Use the Cloud Data Loss Prevention (DLP) API
Understand the reference architecture of streaming pipelines
Build and execute a streaming pipeline

Topics covered

→Connectors
→DLP
→Reference architecture for streaming applications
→Building streaming pipelines

Activities

Graded lab

quiz

discussion activity

Objectives

List types of metadata
Differentiate between business, technical, and operational metadata
Understand what data lineage is
Understand the importance of maintaining data lineage
Differentiate between metadata and data lineage

Topics covered

→Metadata
→Data lineage

Activities

Graded lab and quiz

Objectives

Review the course objectives & concepts

Topics covered

→Course Summary

Related Trainings

Best

dbt

Learn to transform your data with dbt, the leading tool in the Modern Data Stack. You'll start by understanding the evolution of data architectures and the difference between ETL and ELT. You'll install dbt, create your first project and connect it to your data sources. Then you'll learn to build structured data models, choose the right materialization options (table, view, incremental) and organize your metadata with tags. You'll discover how to reference your sources and manage dependencies between models. You'll explore advanced features: seeds to initialize reference data, snapshots to track history and manage slowly changing dimensions, Jinja macros and variables to automate your transformations. Finally, you'll implement automated tests to ensure data quality, document your models with lineage, and discover packages from the dbt community. Hands-on training with 60% labs.

2 d

Fundamental

Introduction to Data Analytics on Google Cloud

This course is an introduction to data analytics on Google Cloud. It is designed for learners who have no prior experience with data analytics or Google Cloud. The course covers the basics of data analysis, including collection, storage, exploration, visualization, and sharing. It also introduces learners to Google Cloud's data analytics tools and services. Through video lectures, demos, quizzes, and hands-on labs, the course demonstrates how to go from raw data to impactful visualizations and dashboards.

1 d

Fundamental

Introduction to Data Engineering on Google Cloud

In this course, you learn about data engineering on Google Cloud, the roles and responsibilities of data engineers, and how those map to offerings provided by Google Cloud. You also learn about ways to address data engineering challenges.

1 d

Fundamental

View all trainings →

Upcoming sessions

May 12, 2026

Distanciel • Français

August 12, 2026

Distanciel • Français

November 16, 2026

Distanciel • Français

Quality Process

SFEIR Institute's commitment: an excellence approach to ensure the quality and success of all our training programs. Learn more about our quality approach

Teaching Methods Used

Lectures / Theoretical Slides — Presentation of concepts using visual aids (PowerPoint, PDF).
Technical Demonstration (Demos) — The instructor performs a task or procedure while students observe.
Guided Labs — Guided practical exercises on software, hardware, or technical environments.
Quiz / MCQ — Quick knowledge check (paper-based or digital via tools like Kahoot/Klaxoon).

Evaluation and Monitoring System

The achievement of training objectives is evaluated at multiple levels to ensure quality:

Continuous Knowledge Assessment : Verification of knowledge throughout the training via participatory methods (quizzes, practical exercises, case studies) under instructor supervision.
Progress Measurement : Comparative self-assessment system including an initial diagnostic to determine the starting level, followed by a final evaluation to validate skills development.
Quality Evaluation : End-of-session satisfaction questionnaire to measure the relevance and effectiveness of the training as perceived by participants.

Registration

1,580 € excl. VAT per learner

Reserve a seat

Train multiple employees

Volume discounts (multiple seats)
Private or custom session
On-site or remote

Request a quote

1,580€ excl. VAT

per learner

Reserve Quote

Data Integration with Cloud Data Fusion

What you will learn

Prerequisites

Target audience

Training Program

Introduction

Objectives

Topics covered

Introduction to data integration and Cloud Data Fusion

Objectives

Topics covered

Activities

Building pipelines

Objectives

Topics covered

Activities

Designing complex pipelines

Objectives

Topics covered

Activities

Pipeline execution environment

Objectives

Topics covered

Activities

Building Transformations and Preparing Data with Wrangler

Objectives

Topics covered

Activities

Connectors and streaming pipelines

Objectives

Topics covered

Activities

Metadata and data lineage

Objectives

Topics covered

Activities

Summary

Objectives

Topics covered

Related Trainings

dbt

Introduction to Data Analytics on Google Cloud

Introduction to Data Engineering on Google Cloud

Upcoming sessions

Quality Process

Registration

Train multiple employees