artifacts/intake-archive/20260622__continuity-office-intake/010-incident-memory

Incident Memory

artifacts/intake-archive/20260622__continuity-office-intake/010-incident-memory/index.md

Rendered from markdown source. Open raw source on GitHub.

--- catalog: "Free Training Catalog" training_id: "010" title: "Incident Memory" subtitle: "Blameless postmortems that actually preserve learning" track: "Core Practices" estimated_time: "20–30 minutes" audience:

  • Executives
  • Operators
  • IT / Security
  • Product
  • Compliance

learning_outcomes:

  • Retain learning from failures without blame
  • Turn incidents into durable memory
  • Prevent repeated failure through continuity

prerequisites: "Training 001–009 recommended" level: "Introductory" license: "Free / Open Training" version: "1.0" last_updated: "2025-12-18" ---

Incident Memory

Blameless postmortems that actually preserve learning

Training 010 · Core Practices Time: 20–30 minutes

---

Core stance

Incidents are inevitable. Forgetting why they happened is optional.

Incident memory is the practice of preserving causal understanding, not assigning fault.

---

Why this lesson exists

Many organizations run postmortems, yet still:

  • Repeat the same failures
  • Lose context after a few months
  • Treat incidents as embarrassing anomalies
  • Optimize reports for defensibility instead of learning

The problem is not the postmortem ritual. It is the absence of memory continuity.

---

What incident memory is (and is not)

Incident memory is

  • Causal, not narrative
  • Durable beyond the people involved
  • Accessible to future operators
  • Explicit about assumptions and conditions

Incident memory is not

  • A blame assignment
  • A performance evaluation
  • A legal defense memo
  • A checklist exercise

Learning dies when incidents are treated as personal failures instead of system signals.

---

Why postmortems usually fail

Postmortems often fail because:

  • They focus on timeline, not causality
  • They stop at “human error”
  • They are written once and never revisited
  • They are stored but never retrieved

This creates the illusion of learning without its benefits.

---

The incident memory pattern

A continuity-safe incident memory answers five questions:

  1. What failed?

(Observed behavior, not interpretation)

  1. Why did it fail?

(Causal chain, including system and context)

  1. What assumptions were wrong or stressed?

(What we believed that no longer holds)

  1. What changed as a result?

(Decisions, safeguards, boundaries)

  1. What would cause this to be revisited?

(Conditions, not dates)

If these are preserved, learning survives turnover.

---

Blameless does not mean consequence-free

Blameless means:

  • We do not punish people for system failures
  • We do not erase responsibility
  • We do not avoid hard truths

Accountability remains—but it targets systems and decisions, not individuals.

---

Incident memory and AI

AI systems:

  • Fail in non-obvious ways
  • Mask causality with performance
  • Scale small errors quickly

Without incident memory:

  • AI mistakes repeat silently
  • Confidence replaces understanding
  • Oversight erodes

Incident memory creates:

  • Explainable failure
  • Safer iteration
  • Defensible automation

---

Exercises

Drill 1 — Rewrite an Old Incident

Pick a past incident report.

Rewrite it to clearly answer:

  • Why it failed
  • What assumption broke
  • What changed

Ignore the timeline if needed.

---

Drill 2 — Assumption Capture

During your next incident discussion, ask:

“What did we assume that turned out not to be true?”

Write that down explicitly.

---

Drill 3 — Memory Placement

Decide where incident memory should live so it is:

  • Discoverable
  • Trusted
  • Revisitable

Move one incident there.

---

FAQ

Isn’t this just SRE practice? SRE techniques are one implementation. Incident memory applies to all failures, not just outages.

Won’t this create legal risk? In practice, clear causal understanding reduces repeated harm and exposure.

Who owns incident memory? The incident owner captures it. Continuity ensures it persists.

---

Suggested next step

Take one recent incident. Preserve its causal learning using the five-question pattern.

That single act prevents recurrence.

---

Next: Training 011 — AI Mandates & Boundaries How to prevent silent scope expansion in automated systems.