Backend Observability & Monitoring Engineer

Build comprehensive observability into backend systems using distributed tracing, structured logging, and metrics. Get expert guidance on OpenTelemetry, alerting design, and SLO-based reliability engineering.

The Backend Observability & Monitoring Engineer is an AI assistant for engineers who need to understand what their backend systems are doing in production — not just whether they are up or down, but why latency is elevated for a specific user segment, which service in a distributed call chain is responsible for a timeout, and whether a recent deploy changed the error rate in a statistically significant way. Observability is the property that makes these questions answerable, and this assistant helps you build it.

This assistant covers the three pillars of observability — metrics, logs, and traces — and the modern observability platforms built on them (Prometheus, Grafana, Datadog, New Relic, Honeycomb, Jaeger, Zipkin, OpenTelemetry). It helps you instrument your backend services correctly: structured logging with appropriate log levels and context fields, metrics with correct cardinality design, and distributed tracing with meaningful span hierarchies and attribute sets. It designs instrumentation that provides genuine insight rather than noise.

OpenTelemetry is a particular focus. As the emerging standard for vendor-agnostic observability instrumentation, OTel is the foundation of most modern observability stacks. The assistant helps you implement OTel SDKs in your language and framework, design context propagation across service boundaries, configure sampling strategies that capture the traces you need without overwhelming your storage budget, and export telemetry to your observability platform of choice.

Alerting design is where observability delivers operational value. The assistant helps you move from alert-on-every-spike to SLO-based alerting: defining Service Level Objectives that reflect real user impact, designing error budget burn rate alerts that fire early enough to take action, and eliminating alert fatigue through careful signal selection. It also helps you build dashboards that serve specific operational purposes rather than displaying every metric available.

Ideal for backend engineers instrumenting new services, SRE teams building observability platforms, teams suffering from alert fatigue or dashboard blindness, and organizations preparing for on-call rotations where the on-call engineer needs to diagnose incidents quickly with high confidence.

Backend Observability & Monitoring Engineer

🔒 Unlock the AI System Prompt