What Poisson analysis is
A Poisson distribution models the probability of a given number of independent rare events occurring within a fixed interval of time or space, when the average rate of occurrence is known. It is the standard statistical framework for answering the question: “Is this cluster of events more concentrated than we would expect by chance?”
The critical inputs are the baseline rate (how often does this event happen per unit of time in the relevant population?) and the observed count (how many events occurred in the window under scrutiny?). From these, Poisson calculates the probability of observing a cluster at least as extreme as the one seen, assuming no unusual cause. If that probability is very low, the cluster is statistically anomalous. If it is not very low, the cluster is within the expected range of natural variation.
The framework has well-known pitfalls. Two are particularly relevant here: the “prosecutor’s fallacy” (confusing the probability of the cluster given innocence with the probability of innocence given the cluster) and the “Texas sharpshooter fallacy” (identifying the cluster boundary after seeing the data, which artificially maximises its apparent anomaly). Both were identified by the independent statistical experts as concerns with the original prosecution reasoning.
Why neonatal-mortality clusters are Poisson-appropriate
Neonatal deaths at a single unit are rare, approximately independent events that occur at a roughly constant underlying rate within a comparable population and time window. They are not contagious (one death does not directly cause another), they are not scheduled, and they are individually low-probability. These are precisely the conditions under which Poisson analysis is valid and reliable.
The key challenge in neonatal-unit applications is specifying the correct baseline rate. The baseline must account for the unit’s case-mix: a unit receiving more extremely preterm infants, multiples, or complex referrals will have a higher expected mortality rate than a unit receiving mostly low-risk term neonates. Applying a generic national average to a non-representative unit produces a false anomaly. This is precisely the case-mix-adjustment problem that the Hawkins/Gill triplets-to-singletons research and the Prakesh Shah cohort analysis address.
The COCH baseline
The Countess of Chester Hospital neonatal unit expanded its referral intake in the period immediately preceding the 2015–2016 cluster. The unit began accepting more multiple-gestation births (twins and triplets) and higher-acuity transfers that would previously have gone to tertiary centres. Each of these case-mix shifts independently raises the expected mortality rate.
The prosecution’s cluster analysis did not apply a case-mix-adjusted baseline. It compared raw observed deaths against a unit-specific historical baseline that predated the case-mix change, or against national averages that do not reflect the COCH intake profile. Applying either comparator to a unit whose intake had become significantly higher-risk will systematically overstate the anomaly. Norman Fenton and Jane Hutton both identify this as a core methodological weakness in the prosecution’s statistical reasoning.
Applying Poisson to the 2015–2016 cluster
Under the prosecution’s unadjusted comparator, the 2015–2016 COCH death rate appeared to be several standard deviations above the historical COCH baseline. Under a case-mix-adjusted baseline — one that accounts for the shift toward higher-risk multiples and complex referrals — the excess mortality is substantially reduced or eliminated entirely, depending on the adjustment assumptions used.
David Spiegelhalter’s probability commentary additionally identifies a multiple-testing concern: the investigation was triggered after a cluster was observed, and the statistical window was defined around the cluster period. Any cluster defined retrospectively will appear more extreme than a prospectively defined one, because the window is chosen to maximise the apparent anomaly. Adjusting for this effect further reduces the statistical significance of the cluster.
Richard Gill’s statistical methodology paper addresses the independence assumption: if some deaths share a common natural cause (a nosocomial infection episode, a ventilator cohort, a shared prematurity risk factor), the events are not fully independent, which invalidates the standard Poisson calculation. A cluster of partially correlated events looks more anomalous under independent-Poisson than it actually is.
What the analysis shows
The combined effect of case-mix adjustment, retrospective window selection correction, and partial-dependence adjustment is to substantially reduce the statistical anomaly in the COCH 2015–2016 cluster. The consensus of the independent statistical experts is that, once these corrections are applied, the cluster falls within the plausible range of natural variation for a unit with the COCH intake profile during that period.
This does not prove that no deliberate harm occurred. What it does is remove the statistical foundation for treating the cluster as itself evidence of deliberate harm. If the cluster is not anomalous, it cannot be used as independent corroboration for the clinical evidence on individual counts.
How this maps onto the conviction-safety question
The cluster evidence served two functions at trial. First, it was relied upon to establish a pattern suggesting a common cause. Second, it was implicitly used to set the prior probability of deliberate harm against which individual case evidence was evaluated. If the cluster is not statistically anomalous, both functions fail.
The CCRC, in considering whether the convictions are safe, will need to assess whether the jury, properly directed on the corrected statistical analysis, would necessarily have returned the same verdicts. The statistical experts’ consensus is a new and material development in that assessment. For the legal framework of how statistical evidence interacts with the CCRC referral test, see appeal vs CCRC distinction.