All insights

The incident is the cheap part

An investigation only pays off when its findings actually change something on the plant, a setpoint, a procedure, a safeguard, and get followed through. A report that's written, filed and forgotten is pure cost, and the same thing happens again.

An incident is expensive. The lost production, the damage, the people pulled off their normal jobs to investigate, and sometimes a great deal worse, all of that gets paid for whether or not anything improves afterwards. The only part that can earn any of it back is the lesson, and the lesson is the part that most often gets wasted.

I’ve read a lot of investigation reports over the years. The good ones are genuinely satisfying to read, with a clear story of what happened, an honest root cause and a sensible set of recommendations. And then, more often than I’d like, nothing on the plant actually changes. The report gets written, passed round and filed, and the unit carries on exactly as it did before. That’s filing, not learning.

The rule already asks for more than a report

It helps to remember that simply investigating and filing was never what the rule intended. The US process safety management regulation does require an employer to investigate any incident that caused, or could easily have caused, a serious release (OSHA 29 CFR 1910.119(m)(1)), and to start within 48 hours (m)(2). But it doesn’t stop at the report. It goes on to require that the findings actually get dealt with and resolved, with what was done written down (m)(5), that the report is talked through with the people whose jobs it affects (m)(6), and that the record is kept for five years (m)(7). The part of all that which carries the value is the fixing, not the writing-up. The change that actually makes the plant safer is the thing you were really paying for.

Look for the safeguard that failed, not the person to blame

A lesson can only change something if the investigation found the right thing to change in the first place. Done well, an investigation works out which safeguard failed: the trip that was switched off, the procedure that was unclear, the alarm that got lost in the noise. Done badly, it stops at the person who happened to be there when it went wrong. “Operator error” is almost never the real cause; it’s just where an investigation stops when it runs out of energy. If you blame the person, all you learn is who to move. If you find the safeguard that failed, you learn what to change so that the next person in that seat can’t make the same mistake. That difference is the whole thing, because only the second kind of finding turns into a real change: a different setpoint, a clearer procedure, an extra safeguard, a change that actually sticks.

Judge an investigation by what’s different on the plant because of it. If you can’t point to the setpoint, the procedure or the safeguard that changed, the lesson never really landed, however well the report reads.

BP Texas City and the lessons that didn’t stick

The clearest example is BP’s Texas City refinery in 2005. The US Chemical Safety Board found that the disaster was caused by safety problems running right through the organisation, not by a single broken valve or one bad shift. The independent review that followed, the Baker Panel, went further and found the same kinds of problem across all of BP’s US refineries, not just Texas City. The lessons already existed. What was missing was anyone holding on to them and actually making them change how each site was run.

Trevor Kletz, who did more than almost anyone to shape how we think about process safety, summed this up years earlier when he said that organisations have no memory. Only people remember, and when they move on the accident they once investigated gets forgotten, until it happens again, often in the same company, a decade or so later (Kletz, Lessons from Disaster, IChemE, 1993). A neatly filed report is exactly how an organisation manages to forget while telling itself it has learned.

Where the lesson gets lost

  • The investigation finishes and the actions quietly slip. This is the most common failure of all, and it’s exactly why the rule insists the findings be resolved, not just recorded.
  • The actions that do get written down change nothing real. “Raise awareness” and “remind the crew” aren’t changes. A setpoint, a lock on a valve, a step in a procedure, a check on someone’s competence: those are.
  • Nobody is responsible for seeing the actions through, so they fade away. Writing down how each one was resolved isn’t box-ticking, it’s the thing that turns a finding into an actual change.
  • The lesson stays on the one unit where it happened. An incident on one unit is a warning to every similar unit, and if it never gets passed across to the others, you’ve paid for a lesson and used a fraction of it.

The takeaway

I treat the report as the cheap part and the change as the thing I’m actually after. Before I’m willing to close an investigation, I want to be able to point at something real that is now different, a setpoint, a procedure, a safeguard, and check that it’s genuinely in place, and then I want that same change taken across to every unit it applies to. The test I use is a simple one. If this exact thing happened again a year from now, what specifically would stop it? If the honest answer is that the report is on file, then you’ve paid for the incident and left the lesson lying on the table.


References

  • OSHA 29 CFR 1910.119, Process safety management of highly hazardous chemicals, incident investigation at (m), including resolution of findings (m)(5) and review with affected personnel (m)(6).
  • US Chemical Safety Board, Investigation Report: BP America Refinery Explosion (Texas City, 2005).
  • T. A. Kletz, Lessons from Disaster: How Organisations Have No Memory and Accidents Recur (Institution of Chemical Engineers, 1993).