Blameless Post-Mortem Guide
Post-Mortems That People Actually Want to Attend
Bad post-mortems feel like an interrogation. Good ones feel like a group debugging session where everyone is trying to solve the same puzzle together. The difference comes down to facilitation. A skilled facilitator keeps the conversation pointed at systems and processes. When someone starts blaming a person, you redirect: the question is not "who messed up" but "what about our setup allowed this mistake to have this kind of impact."
Getting the Five Whys Right
The five whys is probably the most misused technique in incident analysis. Done poorly, it becomes a blame funnel. "Why did the service go down? Because John deployed bad code. Why did John deploy bad code?" Now you are just interrogating John. Done well, it follows system paths instead. "Why did the service go down? Because a config change went out without validation. Why was there no validation? Because our pipeline skips integration tests for config changes." That gives you something concrete to fix.
What Makes an Action Item Worth Anything
The single biggest predictor of whether a post-mortem was useful is the quality of the action items. "Improve monitoring" is useless. "Add a P99 latency alert on the checkout service that pages on-call when latency exceeds 500ms for 3 consecutive minutes, owned by Sarah, due by March 15" will actually get done. Every action item should answer three things: what exactly changes, who owns it, and when is it done.
Making Post-Mortems Part of the Culture
Organizations that share post-mortems broadly, not just within the affected team but across all of engineering, build collective resilience. When the payments team reads about a cascading failure that hit the search team, they go check their own circuit breakers. This kind of cross-pollination is one of the highest-leverage activities in an engineering org. Google, Etsy, and PagerDuty all publish selected post-mortems externally because the learning value extends well beyond company walls.
The Real Problem: Follow-Through
Here is the uncomfortable truth about post-mortems. Most action items never get completed. They get filed into a ticket system, deprioritized during the next sprint planning, and forgotten until the same incident happens again six months later. The fix is straightforward but requires discipline: review open post-mortem action items in weekly leadership meetings and treat overdue items as a sign that the team is carrying risk nobody has acknowledged.
Incident Timeline
- Day 1Incident resolved, initial timeline captured while everything is still fresh
- Day 2-3Post-mortem document drafted with contributing factors and a detailed timeline
- Day 3-5Review meeting held with everyone who was involved
- Day 5-7Action items assigned with specific owners and deadlines
- Day 14First check-in on action item progress
- Day 30Action items completed and verified
Detection Signals
- •Post-mortems getting skipped or pushed past the one-week mark
- •Action items sitting incomplete past 30 days
- •Same root cause showing up in multiple incidents
- •People avoiding post-mortem meetings or not participating
Prevention
- Schedule post-mortems within 48 hours while the context is still fresh
- Stick to a standard template so post-mortems are consistent across teams
- Track action items in a shared system with due dates and owners
- Go over open action items in weekly engineering leadership meetings
Key Points
- •Blameless does not mean nobody is accountable. It means you focus on systems and processes rather than pointing fingers at individuals.
- •The five whys technique only works when you follow causal chains, not blame chains.
- •Action items need to be concrete, owned by someone specific, and have a deadline. Vague improvements never happen.
- •Sharing post-mortems widely across engineering builds institutional knowledge and stops the same failures from repeating.
- •A good post-mortem is measured by whether the same incident could happen again, not by how long the document is.
Common Mistakes
- ✗Going through the motions instead of treating the post-mortem as a real chance to learn
- ✗Assigning action items with no clear owner or deadline, which guarantees nothing gets done
- ✗Stopping at the obvious cause without digging into the systemic factors underneath
- ✗Writing a post-mortem so long and detailed that nobody bothers reading it