I recently finished the excellent book Math on Trial by Leila Schneps and Coralie Colmez. In it, the authors collect examples where statistical errors have possibly altered the outcome of trials. This weekend I’ll be on a panel about using statistics in science writing, and while the book looked at numbers in the courtroom, many of the lessons also apply to the way numbers are reported in the media. Reading this book really drove home to me the fact that this statistics stuff is important. The correct analysis and interpretation of statistics are sometimes a matter of life and death.
The story that really hit me in the gut was about Lucia de Berk, a Dutch nurse who was convicted of murdering several of her patients and later exonerated. I will not rehash the whole story, which is told very well in the book. The basic idea is that after the unexpected death of one of her patients, de Berk was accused of poisoning the child. And after she became a suspect in that case, the hospital where she worked started investigating past “suspicious incidents” where patients died or had to be resuscitated while under her care.
Ben Goldacre wrote about the de Berk case back in 2007, before the verdict was overturned and again in 2010 right before the appeal that led to her exoneration. He describes one of the many statistical sins of the case this way: “To collect more data, the investigators went back to the wards to find more suspicious deaths. But all the people who have been asked to remember ‘suspicious incidents’ know that they are being asked because Lucia may be a serial killer. There is a high risk that ‘incident was suspicious’ became synonymous with ‘Lucia was present’. Some sudden deaths when Lucia was not present are not listed in the calculations: because they are in no way suspicious, because Lucia was not present.”
Thus, incidents were considered suspicious only if de Berk was present. Furthermore, exactly which cases de Berk was accused of changed. On appeal in 2004, “Lucia was now found guilty of seven murders and three attempted murders—four new murders had been attributed to her on appeal, while only three of the four original murders for which she had been convicted in the first degree were among the new seven.” When someone discovered that she had not been at the hospital for one of the incidents on the list, “that particular death quietly disappeared from the list; no one asked any longer whether it had been natural or unnatural.”
At some point, the probability “one in seven billion” was attached to de Berk’s story in the press. Schneps and Colmez write, “It goes without saying that no mathematical justification of the number was ever included in the articles where it appeared. By unstated consensus, numbers in newspapers carry their own justification, or at least their own prestige, along with them.” In the trial, the statistic “1 in 342 million” was used as the likelihood that de Berk would have been present at so many incidents by chance. The number did not even change when cases were added or subtracted from the “suspicious incidents” list!
The logical and statistical errors in the case were many, and Math on Trial has a thorough account of exactly what the errors were and why they were errors. Luckily, both for de Berk and for justice, two siblings of one of the doctors who had helped the prosecutor’s case started to worry about whether de Berk’s case was built on solid ground. Eventually their efforts led to the reopening of the case and de Berk’s exoneration.
If statistics hadn’t been miscalculated and misunderstood by the prosecution, defense, media, and jury, de Berk may not have had to spend six years in jail for crimes she didn’t commit. These numbers matter, and as writers and citizens, we need to make sure we get them right.