Key Takeaways
- ✓75% of Kubernetes organizations use Prometheus and Grafana for monitoring
- ✓6 steps to create operational dashboards in under 2 hours
TL;DR: This guide shows you how to create effective Grafana dashboards for Kubernetes monitoring in 6 steps: installing Grafana, connecting to Prometheus, creating cluster and pod visualizations, configuring alerts, and optimizing performance. You'll have operational dashboards in under 2 hours.
To deepen these skills, discover the LFD459 Kubernetes for Application Developers training.
Why Create Grafana Dashboards for Kubernetes?
Grafana Kubernetes metrics visualization has become the industry standard. According to Grafana Labs, 75% of organizations using Kubernetes adopt Prometheus and Grafana for their monitoring. This combination provides complete visibility into your cluster health.
"Kubernetes is no longer experimental but foundational. Soon, it will be essential to AI as well." - Chris Aniszczyk, CNCF State of Cloud Native 2026
A well-designed dashboard allows you to:
- Detect anomalies before they impact production
- Reduce MTTR (Mean Time To Recovery) by 40 to 60%
- Correlate metrics between infrastructure and applications
Key takeaway: Without adapted dashboards, you're monitoring without understanding. Grafana transforms raw metrics into actionable insights.
Prerequisites
Before starting, verify these elements:
| Component | Minimum Version | Verification |
|---|---|---|
| Kubernetes | 1.28+ | kubectl version --short |
| Helm | 3.12+ | helm version |
| Prometheus | Installed | kubectl get pods -n monitoring -l app=prometheus |
| kubectl | Configured | kubectl cluster-info |
Verify your cluster:
kubectl get nodes
# Expected result:
# NAME STATUS ROLES AGE VERSION
# master Ready control-plane 30d v1.29.2
# worker1 Ready <none> 30d v1.29.2
# worker2 Ready <none> 30d v1.29.2
If Prometheus is not installed, consult our guide Deploy the complete kube-prometheus stack in production environment.
Step 1: Install Grafana on Kubernetes with Helm
1.1 Add the Helm Grafana repository
Run these commands to configure Helm:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Expected result:
# "grafana" has been added to your repositories
# Hang tight while we grab the latest from your chart repositories...
# ...Successfully got an update from the "grafana" chart repository
1.2 Create the monitoring namespace
kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -
# Expected result:
# namespace/monitoring created (or unchanged if existing)
1.3 Deploy Grafana with optimized configuration
Create the values.yaml file:
# grafana-values.yaml
persistence:
enabled: true
size: 10Gi
adminPassword: "YourSecurePassword123!"
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server.monitoring.svc.cluster.local
access: proxy
isDefault: true
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
Install Grafana:
helm install grafana grafana/grafana \
--namespace monitoring \
--values grafana-values.yaml \
--version 7.3.0
# Expected result:
# NAME: grafana
# NAMESPACE: monitoring
# STATUS: deployed
# REVISION: 1
1.4 Verify deployment
kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana
# Expected result:
# NAME READY STATUS RESTARTS AGE
# grafana-7d5b6b8f4c-x2kj9 1/1 Running 0 2m
Key takeaway: Persistence is essential. Without PVC, your dashboards disappear when the pod restarts.
Step 2: Configure Prometheus Data Source
2.1 Access the Grafana interface
Expose Grafana temporarily:
kubectl port-forward svc/grafana -n monitoring 3000:80 &
# Expected result:
# Forwarding from 127.0.0.1:3000 -> 3000
Access http://localhost:3000 with credentials:
- User: admin
- Password: YourSecurePassword123!
2.2 Verify Prometheus connection
If you used the values.yaml above, Prometheus is already configured. Verify the connection:
- Go to Configuration → Data Sources
- Click on Prometheus
- Click on Test
# Expected result:
# ✓ Data source is working
If the connection fails, verify the Prometheus service URL:
kubectl get svc -n monitoring | grep prometheus
# Expected result:
# prometheus-server ClusterIP 10.96.45.123 <none> 80/TCP 30d
For teams preparing for CKAD certification, mastering these interconnections is covered in the LFD459 Kubernetes for Application Developers training.
Step 3: Create a Cluster Overview Dashboard
3.1 Create a new dashboard
- Click on + → New Dashboard
- Click on Add visualization
- Select Prometheus as the source
3.2 Add cluster CPU panel
Configure the PromQL query:
sum(rate(container_cpu_usage_seconds_total{namespace!="kube-system"}[5m])) by (namespace)
Panel parameters:
| Parameter | Value |
|---|---|
| Title | CPU by namespace |
| Visualization | Time series |
| Legend | {{namespace}} |
| Unit | percent (0-100) |
3.3 Add cluster memory panel
sum(container_memory_working_set_bytes{namespace!="kube-system"}) by (namespace) / 1024 / 1024 / 1024
Configuration:
- Title: Memory by namespace (GiB)
- Unit: gibibytes
3.4 Add pods by state
sum(kube_pod_status_phase) by (phase)
Create a stat panel with these values:
# Panel configuration
Visualization: Stat
Calculation: Last
Color mode: Value
Graph mode: None
Text mode: Value and name
According to the CNCF 2025 report, 82% of container users run Kubernetes in production, making this type of monitoring essential.
Key takeaway: Always start with a global view before drilling down to pod level. This top-down approach accelerates diagnosis.
Step 4: Create a Pod Monitoring Dashboard
4.1 Dashboard with dynamic variables
Add variables to filter dynamically:
- Go to Dashboard Settings → Variables
- Create the namespace variable:
Name: namespace
Type: Query
Data source: Prometheus
Query: label_values(kube_pod_info, namespace)
Refresh: On dashboard load
- Create the pod variable:
Name: pod
Type: Query
Query: label_values(kube_pod_info{namespace="$namespace"}, pod)
Refresh: On time range change
4.2 CPU per pod panel
sum(rate(container_cpu_usage_seconds_total{namespace="$namespace", pod="$pod"}[5m])) by (container)
4.3 Memory per pod panel
sum(container_memory_working_set_bytes{namespace="$namespace", pod="$pod"}) by (container) / 1024 / 1024
4.4 Network I/O panel
# Received bytes
sum(rate(container_network_receive_bytes_total{namespace="$namespace", pod="$pod"}[5m]))
# Transmitted bytes
sum(rate(container_network_transmit_bytes_total{namespace="$namespace", pod="$pod"}[5m]))
Use a graph with two series:
- Receive bytes: green color
- Transmit bytes: blue color
For diagnosing pod problems, see Debug a pod in CrashLoopBackOff on Kubernetes.
Step 5: Configure Grafana Alerts
5.1 Create a CPU alert rule
- Edit a CPU panel
- Go to the Alert tab
- Click on Create alert rule
Alert configuration:
Alert name: High CPU Usage
Evaluate every: 1m
For: 5m
Condition: WHEN avg() OF query(A, 5m, now) IS ABOVE 80
5.2 Configure a contact point
# Example Slack webhook configuration
kubectl create secret generic grafana-slack-webhook \
--from-literal=url='https://hooks.slack.com/services/XXX/YYY/ZZZ' \
-n monitoring
In Grafana:
- Alerting → Contact points → New contact point
- Type: Slack
- Webhook URL:
$(SLACK_WEBHOOK_URL)
5.3 Verify configured alerts
kubectl exec -it $(kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana -o jsonpath='{.items[0].metadata.name}') \
-n monitoring -- grafana-cli admin stats
# Expected result:
# Active alerts: 3
# Dashboard count: 5
"Don't let your knowledge remain theoretical - set up a real Kubernetes environment to solidify your skills." - TealHQ Kubernetes DevOps Guide
Step 6: Optimize Dashboard Performance
6.1 Reduce query cardinality
Bad practice:
# ❌ Explosive cardinality
container_cpu_usage_seconds_total
Good practice:
# ✓ Immediate aggregation
sum by (namespace, pod) (rate(container_cpu_usage_seconds_total[5m]))
6.2 Configure caching
Add these parameters to values.yaml:
grafana.ini:
dataproxy:
timeout: 30
keep_alive_seconds: 30
caching:
backend: database
6.3 Define appropriate refresh intervals
| Dashboard type | Recommended interval |
|---|---|
| Real-time view | 10s |
| Normal operations | 30s |
| Historical reports | 5m |
Configure in Dashboard Settings:
Auto-refresh: 30s
Time range: Last 6 hours
For a complete approach to Kubernetes Monitoring and Troubleshooting, explore our other practical guides.
Troubleshooting Common Issues
Dashboard doesn't load data
Verify Prometheus connectivity:
kubectl exec -it $(kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana -o jsonpath='{.items[0].metadata.name}') \
-n monitoring -- wget -qO- http://prometheus-server.monitoring.svc.cluster.local/api/v1/status/runtimeinfo
# Expected result:
# {"status":"success","data":{...}}
PromQL queries too slow
Analyze with:
# Check cardinality
count by (__name__)({__name__=~".+"})
If a metric exceeds 100,000 series, add filters:
# Filter by namespace
sum(rate(container_cpu_usage_seconds_total{namespace=~"prod|staging"}[5m]))
Grafana pod in CrashLoopBackOff
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana --tail=50
# Look for permission or PVC errors
Common solution:
kubectl delete pvc grafana -n monitoring
helm upgrade grafana grafana/grafana -n monitoring --values grafana-values.yaml
Consult the complete guide Debug a pod in CrashLoopBackOff for more complex cases.
Recommended Community Dashboards
Import these dashboards from Grafana.com:
| ID | Name | Usage |
|---|---|---|
| 315 | Kubernetes cluster monitoring | Global view |
| 13332 | kube-state-metrics v2 | K8s object states |
| 6417 | Kubernetes Pods | Pod detail |
| 14205 | Node Exporter Full | System metrics |
Import via CLI:
curl -s https://grafana.com/api/dashboards/315/revisions/latest/download \
| kubectl exec -i $(kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana -o jsonpath='{.items[0].metadata.name}') \
-n monitoring -- grafana-cli admin import-dashboard
To master the complete observability stack, see our guide Complete guide: install and configure Prometheus on Kubernetes.
Take Action: Get Kubernetes Monitoring Training
Creating effective Grafana dashboards for Kubernetes monitoring is an essential skill for any infrastructure engineer or Cloud-Native developer. With 82% of organizations using Kubernetes in production (CNCF 2025), this expertise positions you in a market where average salary reaches $152,640/year (Ruby On Remote).
"Demand and salaries for highly-skilled and qualified tech talent are fiercer than ever, and certifications present a clear pathway for IT professionals to further their careers." - Hired CTO via Splunk
Recommended trainings:
- LFS458 Kubernetes Administration: Master complete Kubernetes cluster administration (4 days, CKA preparation)
- LFD459 Kubernetes for Application Developers: Develop and deploy containerized applications (3 days, CKAD preparation)
- Kubernetes Fundamentals: Discover essential concepts in one day
Explore our other resources:
Contact our advisors to plan your Kubernetes skills development.