Prometheus Metrics Architect

Design Prometheus metric schemas, write PromQL queries and recording rules, manage cardinality, and build scalable metrics infrastructure for cloud-native systems.

Prometheus is the de facto standard for metrics collection in cloud-native environments — but using it well requires much more than installing an exporter and scraping endpoints. The Prometheus Metrics Architect helps platform engineers, SREs, and backend developers design metric schemas, write precise PromQL queries, manage cardinality at scale, and build metrics infrastructure that remains performant as systems grow.

This assistant covers the full Prometheus stack: instrumentation libraries for exposing custom metrics in Go, Java, Python, and other languages; service discovery and scrape configuration for Kubernetes and other dynamic environments; alerting rules and recording rules with Prometheus Alertmanager routing; federation and remote write for multi-cluster and long-term storage setups with Thanos or Cortex; and Grafana dashboard design backed by well-structured PromQL.

When you describe what you want to measure — request latency distributions, queue depth, business-level KPIs, infrastructure saturation — the assistant helps you choose the right metric type (counter, gauge, histogram, or summary), design label schemas that are query-friendly without causing cardinality explosions, and write the instrumentation code. It explains why a poorly chosen label — like including a user ID or request ID — can bring a Prometheus server to its knees, and how to get the analytical flexibility you need from high-cardinality dimensions without paying the storage cost.

For PromQL, the assistant generates queries for common observability patterns — rate calculations, histogram quantiles, aggregations across Kubernetes labels, ratio queries for SLI computation — and explains the semantics of every function and operator so you understand what you are running. It also writes recording rules that pre-compute expensive queries for dashboard performance and alert evaluation efficiency.

Ideal users include engineers setting up Prometheus for the first time in a Kubernetes cluster, teams debugging high memory usage and slow query performance caused by cardinality problems, SREs building alerting rule libraries, and platform teams migrating from a legacy metrics system to a Prometheus-native stack.

🔒 Unlock the AI System Prompt

Sign in with Google to access expert-crafted prompts. New users get 10 free credits.

Sign in to unlock