Logging, Monitoring, and Observability in Google Cloud
This course teaches participants techniques for monitoring and improving infrastructure and application performance in Google Cloud. Using a combination of presentations, demos, hands-on labs, and real-world case studies, attendees gain experience with full-stack monitoring, real-time log management and analysis, debugging code in production, tracing application performance bottlenecks, and profiling CPU and memory usage.

What you will learn
- Explain the purpose and capabilities of Google Cloud Observability.
- Implement monitoring for multiple cloud projects.
- Create alerting policies, uptime checks, and alerts.
- Install and manage Ops Agent to collect logs for Compute Engine.
- Explain Cloud Operations for GKE.
- Analyze VPC Flow Logs and firewall rules logs.
- Analyze and export Cloud Audit Logs instances.
- Profile and identify resource-intensive functions in an application.
- Analyze resource utilization cost for monitoring related components within Google Cloud.
Prerequisites
- Complete the Google Cloud Fundamentals: Core Infrastructure course or have equivalent experience.
- Have basic scripting or coding familiarity.
- Be proficient with command-line tools and Linux operating system environments.
Target audience
- Cloud architects, administrators, and SysOps personnel, Cloud developers and DevOps personnel
Training Program
9 modules to master the fundamentals
Objectives
- Describe the purpose and capabilities of Google Cloud Observability.
- Explain the purpose of the Cloud Monitoring tool.
- Explain the purpose of Cloud Logging and Error Reporting tools.
- Explain the purpose of Application Performance Management tools.
Topics covered
- →Google Cloud Observability purpose and capabilities
- →Cloud Monitoring tool
- →Cloud Logging and Error Reporting tools
- →Application Performance Management tools
Activities
One quiz
Objectives
- Use Cloud Monitoring to view metrics for multiple cloud projects.
- Explain the different types of dashboards and charts that can be built.
- Create an uptime check.
- Explain the cloud operations architecture.
- Explain and demonstrate the purpose of using Monitoring Query Language (MQL) for monitoring.
Topics covered
- →Using Cloud Monitoring for multiple cloud projects
- →Dashboards and charts
- →Uptime checks
- →Cloud operations architecture
- →Monitoring Query Language (MQL)
Activities
One quiz
One lab
Objectives
- Explain alerting strategies.
- Explain alerting policies.
- Explain error budget.
- Explain why server-level indicators (SLIs), service-level objectives (SLOs), and service-level agreements (SLAs) are important.
- Identify types of alerts and common uses for each.
- Use Cloud Monitoring to manage services.
Topics covered
- →Alerting strategies
- →Alerting policies
- →Error budget
- →Server-level indicators (SLIs), service-level objectives (SLOs), and service-level agreements (SLAs)
- →Types of alerts and their uses
- →Managing services with Cloud Monitoring
Activities
One quiz
One lab
Objectives
- Use Log Explorer features.
- Explain the features and benefits of logs-based metrics.
- Define log sinks (inclusion filters) and exclusion filters.
- Explain how BigQuery can be used to analyze logs.
- Export logs to BigQuery for analysis.
- Use log analytics on Google Cloud.
Topics covered
- →Log Explorer features
- →Logs-based metrics
- →Log sinks (inclusion and exclusion filters)
- →Log analysis using BigQuery
- →Log analytics on Google Cloud
Activities
One quiz
One lab
Objectives
- Explain Cloud Audit Logs.
- List and explain different audit logs.
- Explain the features and functionalities of the different audit logs.
- List the best practices to implement audit logs.
Topics covered
- →Cloud Audit Logs overview
- →Types of audit logs
- →Features and functionalities of audit logs
- →Best practices for implementing audit logs
Activities
One quiz
One lab
Objectives
- Use the Ops Agent with Compute Engine.
- Enable and use Kubernetes Monitoring.
- Explain the benefits of using Google Cloud Managed Service for Prometheus.
- Explain the use of PromQL to query Cloud Monitoring metrics.
- Explain the uses of OpenTelemetry.
- Explain custom metrics.
Topics covered
- →Using the Ops Agent with Compute Engine
- →Kubernetes Monitoring
- →Google Cloud Managed Service for Prometheus
- →Using PromQL to query Cloud Monitoring metrics
- →Uses of OpenTelemetry
- →Custom metrics
Activities
One quiz
One lab
Objectives
- Collect and analyze VPC Flow Logs and firewall rules logs.
- Enable and monitor Packet Mirroring.
- Explain the capabilities of the Network Intelligence Center.
Topics covered
- →Collecting and analyzing VPC Flow Logs and firewall rules logs
- →Packet Mirroring
- →Network Intelligence Center capabilities
Activities
One quiz
One lab
Objectives
- Explain the features, benefits, and functionalities of Error Reporting, Cloud Trace, and Cloud Profiler.
Topics covered
- →Features of Error Reporting
- →Features of Cloud Trace
- →Features of Cloud Profiler
Activities
One quiz
One lab
Objectives
- Analyze resource utilization cost for monitoring-related components within Google Cloud.
- Implement best practices for controlling the cost of monitoring within Google Cloud.
Topics covered
- →Analyzing resource utilization cost for monitoring components
- →Best practices for controlling monitoring costs
Activities
One quiz
Quality Process
SFEIR Institute's commitment: an excellence approach to ensure the quality and success of all our training programs. Learn more about our quality approach
- Lectures / Theoretical Slides — Presentation of concepts using visual aids (PowerPoint, PDF).
- Technical Demonstration (Demos) — The instructor performs a task or procedure while students observe.
- Guided Labs — Guided practical exercises on software, hardware, or technical environments.
- Quiz / MCQ — Quick knowledge check (paper-based or digital via tools like Kahoot/Klaxoon).
The achievement of training objectives is evaluated at multiple levels to ensure quality:
- Continuous Knowledge Assessment : Verification of knowledge throughout the training via participatory methods (quizzes, practical exercises, case studies) under instructor supervision.
- Progress Measurement : Comparative self-assessment system including an initial diagnostic to determine the starting level, followed by a final evaluation to validate skills development.
- Quality Evaluation : End-of-session satisfaction questionnaire to measure the relevance and effectiveness of the training as perceived by participants.
Train multiple employees
- Volume discounts (multiple seats)
- Private or custom session
- On-site or remote