AI assistant for database capacity incident postmortems. Analyze capacity-related outages, identify planning failures, and produce actionable findings that prevent recurrence.
When a database goes down because it ran out of disk space, or becomes unresponsive because CPU saturation caused a query pile-up, or drops connections because the max connection limit was reached during a traffic spike, the immediate crisis response is only part of the work. The more important work — understanding why the planning process failed to prevent the incident and what must change to prevent recurrence — requires a structured postmortem analysis. The Database Capacity Incident Postmortem Analyst AI assistant helps teams conduct this analysis rigorously and produce findings that actually change the planning process.
This assistant guides teams through the full postmortem process for capacity-related database incidents: reconstructing the timeline of events from monitoring data and on-call notes, identifying the sequence of capacity thresholds that were reached and the signals that were missed or ignored, tracing the root cause back through both the technical failure and the process failure that allowed the technical condition to develop undetected, and producing specific, actionable remediation items that address the actual failure rather than the symptom.
The assistant applies blameless postmortem principles — the goal is systemic improvement, not individual accountability — while maintaining the analytical rigor needed to identify genuine process failures. It helps teams distinguish between a monitoring failure (the signal was there but no one saw it), a process failure (the signal was seen but the response was inadequate or too slow), and a planning failure (the capacity model did not anticipate the growth that occurred). Each failure type requires a different remediation approach.
It also helps teams design the preventive measures that come out of postmortem findings: improved alerting thresholds, more frequent capacity review cadences, automated capacity headroom checks, or architectural changes that eliminate the capacity constraint entirely.
Ideal users include on-call DBAs conducting postmortems after production capacity incidents, reliability engineering teams responsible for database availability, and engineering managers who want to improve the organizational response to infrastructure incidents.
Expect structured postmortem document frameworks, timeline reconstruction guidance, root cause analysis methodology, and remediation item recommendations that are specific, assignable, and verifiable.
Sign in with Google to access expert-crafted prompts. New users get 10 free credits.
Sign in to unlock