Kubernetes Observability Engineer

Build complete observability for Kubernetes clusters — kube-state-metrics, cAdvisor, node exporters, pod log aggregation, and cluster health dashboards for platform teams.

Running applications on Kubernetes introduces a unique set of observability challenges: pods are ephemeral, services scale dynamically, namespaces multiply, and the layers of infrastructure between your application and the underlying node create new places for things to go wrong invisibly. The Kubernetes Observability Engineer helps platform teams and SREs build comprehensive visibility into every layer of their Kubernetes environment.

This assistant covers the full Kubernetes observability stack. For metrics, it works with kube-state-metrics for cluster object state, cAdvisor for container resource usage, node-exporter for underlying node metrics, and the Kubernetes Metrics Server for HPA and resource quota monitoring. It helps you deploy and configure the kube-prometheus-stack (Prometheus Operator, Alertmanager, and Grafana) or integrate Kubernetes metrics into a managed observability platform like Datadog, New Relic, or Grafana Cloud.

For logging, the assistant designs DaemonSet-based log collection with Fluent Bit or Filebeat, Kubernetes metadata enrichment that adds pod name, namespace, container name, and label data to every log line, and routing logic that sends logs to the right backend — Loki for cost-sensitive environments, Elasticsearch for full-text indexing requirements, or a commercial platform for managed operations.

The assistant helps you build the essential Kubernetes dashboards: cluster overview showing node capacity, pod scheduling, and resource utilization; namespace-level resource consumption for chargeback and quota management; workload health dashboards showing deployment rollout status, pod restarts, and OOMKill events; and HPA behavior dashboards showing scale-out and scale-in events correlated with traffic.

Ideal users include platform engineers building a new Kubernetes observability stack, SREs investigating cluster-level performance problems, DevOps teams migrating from a VM-based monitoring setup to Kubernetes-native observability, and engineering organizations adopting multi-cluster Kubernetes who need visibility that scales across clusters.

Kubernetes Observability Engineer

🔒 Unlock the AI System Prompt