Key Takeaways
- ✓Prometheus dominates with 67% production adoption (Grafana Labs 2025)
- ✓'Kubernetes monitoring relies on three pillars: metrics, logs, and traces'
- ✓SaaS solutions offer turnkey integration, open source provides control and reduced cost
TL;DR
Kubernetes monitoring relies on three pillars: metrics, logs, and traces. Prometheus dominates with 67% production adoption according to the Grafana Labs 2025 Observability Survey.
SaaS solutions like Datadog offer turnkey integration. Choose your stack based on your budget, internal skills, and alerting needs. This guide walks you through evaluating each tool step by step.
Professionals who want to master Kubernetes administration follow the LFS458 Kubernetes Administration training.
Prerequisites for Kubernetes Software Engineers
Before comparing tools, verify that you have the following:
- A working Kubernetes cluster (see our multi-node installation guide with kubeadm)
- Administrator access to the cluster (
kubectlconfigured) - Knowledge of basic concepts: Pods, Services, Deployments
- Familiarity with essential kubectl commands
Remember: 82% of container users run Kubernetes in production in 2025 (CNCF Annual Survey 2025). You must monitor your clusters.
Step 1: Understand the Kubernetes Monitoring Landscape
Why Monitoring is Critical for You
According to Cloud Native Now, IT teams spend 34 working days per year resolving Kubernetes problems. Effective monitoring drastically reduces this time.
For you as a Kubernetes software engineer, this means you need to observe every layer of your infrastructure.
The Three Pillars of Observability
Identify the three types of data to collect:
- Metrics: CPU, memory, network latency
- Logs: application and system events
- Traces: distributed request paths
See our article on Kubernetes 2025 trends to understand practice evolution.
Step 2: Evaluate Prometheus + Grafana
Installing the Stack
Prometheus and Grafana represent the open source standard. Deploy the stack via Helm:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring --create-namespace
Verify the installation:
kubectl get pods -n monitoring
Expected output:
NAME READY STATUS RESTARTS AGE
prometheus-kube-prometheus-operator-7d4b6f5b6c-xyz12 1/1 Running 0 2m
prometheus-prometheus-kube-prometheus-0 2/2 Running 0 2m
prometheus-grafana-6b8c9f4d5b-abc34 3/3 Running 0 2m
Strengths for Your Team
- Cost: free (open source)
- Flexibility: you configure each dashboard
- Community: 67% production adoption according to Grafana Labs 2025
Limitations to Consider
- Maintenance: you manage storage and high availability
- Learning curve: PromQL takes time
Remember: If you master Kubernetes cluster administration, Prometheus remains your best value option.
Step 3: Test Datadog for Managed Monitoring
Deploying the Datadog Agent
Install the agent via Helm:
helm repo add datadog https://helm.datadoghq.com
helm install datadog datadog/datadog \
--set datadog.apiKey=YOUR_API_KEY \
--set datadog.site='datadoghq.eu' \
-n datadog --create-namespace
Confirm the deployment:
kubectl get daemonset -n datadog
Advantages for Kubernetes Software Engineers
- Native integration: service auto-discovery
- Pre-built dashboards: operational in minutes
- APM included: distributed traces without configuration
Disadvantages to Evaluate
- Cost: per-host billing ($$$/month)
- Dependency: your data with a third party
Compare with your needs in Kubernetes monitoring and troubleshooting.
Step 4: Explore Alternatives
New Relic One
New Relic offers a "data-first" model. You pay per GB ingested. Adapt this choice if you have variable volumes.
kubectl apply -f https://download.newrelic.com/kubernetes-manifests/newrelic-bundle.yaml
Dynatrace
Dynatrace excels in auto-instrumentation. Its OneAgent automatically detects your workloads.
Elastic Stack (ELK)
To centralize logs and metrics, deploy Elastic:
helm install elasticsearch elastic/elasticsearch -n logging --create-namespace
helm install kibana elastic/kibana -n logging
See our guide on deployment tools to understand prerequisites.
Step 5: Compare Tools by Your Criteria
Complete Comparison Table
| Criterion | Prometheus + Grafana | Datadog | New Relic | Dynatrace |
|---|---|---|---|---|
| Monthly cost | €0 (infra only) | €15-23/host | Variable/GB | €21-69/host |
| Installation | Helm (10 min) | Helm (5 min) | YAML (5 min) | Operator (10 min) |
| K8s metrics | Native | Native | Native | Native |
| APM/Traces | Jaeger separate | Included | Included | Included |
| Alerting | Alertmanager | Included | Included | Included |
| Retention | You manage | 15 days (base plan) | 8 days | 35 days |
| Support | Community | 24/7 | 24/7 | 24/7 |
Which Solution for Which Profile?
Choose Prometheus + Grafana if:
- You have strong internal skills
- Your infrastructure budget is limited
- You want total control
Opt for Datadog if:
- You prioritize speed of implementation
- Your team lacks monitoring expertise
- You have a validated SaaS budget
Remember: 70% of organizations use Kubernetes in cloud and most deploy Helm to simplify their installations (Orca Security 2025).
Step 6: Configure Alerting for Your Kubernetes Environment
Create a Prometheus Rule
Define a CPU alert in an alert-rules.yaml file:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cpu-alerts
namespace: monitoring
spec:
groups:
- name: cpu
rules:
- alert: HighCPUUsage
expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU on {{ $labels.pod }}"
Apply the configuration:
kubectl apply -f alert-rules.yaml
Verify Activation
Access the Prometheus interface:
kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090:9090
Navigate to http://localhost:9090/alerts to confirm your rule appears.
Verify Your Monitoring Stack
Run these commands to validate your installation:
# Check monitoring pods
kubectl get pods -n monitoring -o wide
# Test metrics collection
kubectl top nodes
kubectl top pods --all-namespaces
# Check ServiceMonitors
kubectl get servicemonitors -n monitoring
Expected output for kubectl top nodes:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node-master 256m 12% 1024Mi 26%
node-worker1 512m 25% 2048Mi 52%
See our documentation on node management to optimize your resources.
Troubleshooting Common Issues
Prometheus Not Collecting Metrics
Check ServiceMonitors:
kubectl get servicemonitors -A
kubectl describe servicemonitor prometheus-kube-prometheus-kubelet -n monitoring
Ensure labels match your configuration.
Grafana Not Connecting to Prometheus
Check the datasource:
kubectl logs -n monitoring deployment/prometheus-grafana -c grafana | grep -i prometheus
Alerts Not Firing
Test your PromQL expression directly in the Prometheus interface. Validate that the threshold matches your actual metrics.
For deeper troubleshooting, see our complete Kubernetes training guide.
Recommendations by Use Case
Startup or SMB
Prefer Prometheus + Grafana. You control costs and develop valuable internal skills. To train effectively, explore Kubernetes fundamentals.
Large Enterprise with Multiple Clusters
Consider Datadog or Dynatrace. Centralization simplifies governance. According to Spectro Cloud, 80% of organizations manage an average of 20+ clusters.
Regulated Environment
Deploy an on-premise stack (Prometheus, Thanos, Grafana). You keep your data internal.
Take Action: Train on Kubernetes Monitoring
Monitoring represents a key skill for any Kubernetes software engineer. If you use it, master every aspect, including observability.
Recommended Training
- LFS458 Kubernetes Administration: 4 days to prepare for CKA certification, including cluster monitoring
- LFD459 Kubernetes for Developers: 3 days focused on deployment and application observability
- Kubernetes Fundamentals: 1 day to discover essential concepts
Contact our advisors to build your personalized training path.