Post-mortems & Incident Writing
3 exercises — write professional incident timelines, client-facing outage notifications, and actionable "what we'll do differently" sections.
0 / 3 completed
Post-mortem writing principles
- Timeline: Precise UTC timestamps · specific tech details · shows gaps (where time was lost)
- Client notification: Impact in customer terms · no data loss / data safe · no internal blame · follow-up commitment
- Action items: Owner + due date + specific measurable change · systemic fixes preferred over behaviour changes
- Blameless: Focus on systems and processes, not on individuals
1 / 3
Incident scenario: On 2024-09-14, the payment service experienced a 47-minute outage (14:32–15:19 UTC) caused by a misconfigured load balancer rule deployed at 14:28 UTC. The deploy was not rolled back for 44 minutes because the on-call engineer was in a meeting with phone on silent.
Which post-mortem timeline section is written most effectively?
Option B is the gold standard. Each timeline entry has a precise timestamp, a specific event described in technical terms (deployment number, metric name, specific config field), and shows causation clearly. The gap between 14:41 (diagnosis) and 15:02 (rollback start) makes the 21-minute delay visible — this is crucial for identifying the "where did we lose time?" question that drives improvement. Option A lacks intermediate steps and technical specifics. Option C is vague narrative prose. Option D is a summary paragraph, not a timeline — it hides the 21-minute delay between diagnosis and remediation.
Next up: Career Writing →