'She was there at every collapse' — a proper base-rate examination

What the claim actually is

The prosecution’s headline statistical argument was: Lucy Letby was on shift for all of a selected set of 25 “suspicious” events. No other nurse was on shift for all of them. Therefore the probability that this pattern arose by chance, if she were innocent, is vanishingly small; therefore she cannot be innocent.

That argument has the same structure as Sir Roy Meadow’s “1 in 73 million” calculation in the Sally Clark case. It is a probability of the evidence given innocence presented as if it were a probability of innocence given the evidence. That is the prosecutor’s fallacy.

Step 1: the selection effect

The 25 events were selected as “suspicious” in part because Letby was on shift. Events where she was not on shift — including collapses and deaths outside the indictment — were not included in the chart. When you select the events partly on the feature you are about to use as evidence, you construct a perfect match between evidence and feature by construction.

Prof. Richard Gill has called this the Texas sharpshooter problem: painting the target around the bullet hole. See our analysis of the de Berk parallel and the Sally Clark parallel for why the English and Dutch statistical communities have both previously corrected this same error in previous cases.

Step 2: the base rate of shift attendance

Lucy Letby worked a lot of shifts. She was a young nurse without childcare constraints and she routinely worked unsociable shifts, night shifts, and weekends — the shifts other nurses were more likely to avoid. Over a two-year cluster period, that means her attendance at any given shift is not a 1-in-20-nurses event. It is a substantially elevated probability.

The specific calculation has been done publicly by Prof. Gill and by the triedbystats.com team. The headline result is that a nurse in Letby’s shift pattern has a non-trivial probability of being on shift for any given event on the unit, and that probability compounds across multiple events into a near-certainty that at least one nurse will have been on shift for all of them — even if no wrongdoing is occurring.

Step 3: the compounding

If there are 20 nurses on the unit, and each has a moderate probability of being on a given shift, then the probability that there exists some nurse who was on shift for all of 25 selected events is not 1 in (20 to the 25th power). It is actually close to 1, because with 20 nurses each working many shifts, some nurse will be on most shifts more frequently than others, and the “most-shift” nurse will dominate any retrospectively selected subset.

The triedbystats.com visual simulation lets the reader work through this with different input assumptions. The result is robust: under reasonable assumptions about shift attendance distribution, the prosecution’s chart looks inevitable rather than improbable.

Step 4: what happens when collapses outside the chart are included

When collapses and deaths during the cluster period which were not included in the prosecution chart are plotted alongside the ones that were, the pattern becomes much less exclusive to Letby. Other nurses show full or near-full rows of shift-overlap with various subsets. The chart’s “Letby only” feature was an artefact of the subset that was charted, not a property of the cluster period as a whole.

Step 5: conditional on an outbreak and systems failure

A NICU cluster caused by a combination of superbug outbreak, staffing shortages and infrastructure failures will produce a cluster of deterioration events clustered in time. Whoever works the most shifts during that time window will end up plotted against the most events — mechanically, with no wrongdoing, just as a function of who was on shift when the unit was under most strain.

The Datix record (see our deep-dive) and the Guardian investigation (see our summary) together establish that the unit was under that kind of strain during exactly the period the prosecution chart covers. The chart’s apparent improbability is not improbable at all, once that baseline is in.

Putting it together

The “she was there at every collapse” claim, properly analysed, is:

Based on a selection effect (which events are in the chart) that makes the match between nurse and events automatic.
Calculated using an implicit independence assumption that is wrong (collapses on a unit in outbreak are not independent events).
Compared against a uniform-nurse-attendance model that is wrong (shift attendance is skewed; heavy-shift nurses are over-represented in any selection).
Presented without a qualified statistician to identify the errors.

Every one of these errors was, in the Royal Statistical Society’s post-Sally-Clark guidance, explicitly flagged as the kind of argument not to be put to a jury. The fact that it was nevertheless put to this jury is, in itself, a CCRC question.

“She was there at every collapse” — the base-rate problem