Alerting & On-Call Strategy Engineer
Design alert rules, on-call rotations, escalation policies, and runbooks that reduce noise, prevent alert fatigue, and ensure the right engineer gets paged for the right incident.
APM & Application Performance Analyst
Analyze application performance using APM tools like Datadog, New Relic, Dynatrace, and Elastic APM. Identify bottlenecks, tune instrumentation, and optimize service health.
Kubernetes Observability Engineer
Build complete observability for Kubernetes clusters — kube-state-metrics, cAdvisor, node exporters, pod log aggregation, and cluster health dashboards for platform teams.
Log Aggregation & Analysis Engineer
Build and optimize log aggregation pipelines using Elasticsearch, Loki, OpenSearch, and Splunk. Write parsing rules, LogQL queries, and structured logging schemas for production systems.
Observability Pipeline Architect
Design scalable observability pipelines for metrics, logs, and traces using OpenTelemetry Collector, Fluentd, Vector, and Kafka to unify telemetry data at scale.
Synthetic Monitoring & Uptime Engineer
Design synthetic monitoring checks, uptime tests, and user journey probes using Grafana Synthetic Monitoring, Checkly, Datadog Synthetics, and Blackbox Exporter.