January 28, 2013

How Should We Write about Statistics in Public?

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

A member of the order Lepidoptera enjoys my favorite green space in Chicago, Garfield Park Conservatory. There's a 50 percent chance that this is a below-average lepidopteran. Image: Evelyn Lamb.

I am exited to be attending ScienceOnline in Raleigh, North Carolina later this week. And I'm even more excited to be co-moderating two sessions! One of them, at noon on Thursday, will be about Public Statistics. Hilda Bastian, my partner in crime, has written a cartoon introduction to our session, and I've been trying to think of what to write here about it. There have been a lot of statistics in the news this year, from Nate Silver to the "five-sigma" discovery of a Higgs-like particle to every health story ever. Where to start?

Last week I was flipping through the Chicago Reader over breakfast one morning and came upon the article "A greener Chicago would be a safer Chicago." In my sleepy morning state, my eyes glossed over the page a bit, but they latched onto a paragraph with several numbers in it. Numbers are important and objective (right?), so the part with the most numbers in it must make a clear, convincing argument for the author's main point.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Before I share and critique this excerpt, please know that I love community gardens, and I think it would be good if there were more of them. The thesis of this article is that urban vegetation provides many benefits to a community, including lower crime rates. I am not arguing for or against this position; I am stepping back and thinking about the way statistics are used in this paragraph and whether we should take them as supporting evidence for the article's conclusion. I also don't intend to insult or malign the author. I don't think he is stupid or dishonest, and the online version of the article does provide links to summaries of some of the studies he cites, which can help readers evaluate the claims themselves. I just think he might not have turned a skeptical eye to the statistics he quoted in the article and how they might be interpreted.

Without further ado, here's the paragraph that jumped out at me:

"A recent mapping of gardens [in Chicago] by University of Illinois researchers showed that the vast majority of Chicago residents—2.4 million out of 2.7 million—live in census tracts with no community gardens; that nearly half of these tracts have a poverty rate above the city average of 21 percent; and that most of these low-income tracts are on the south and west sides. These are areas with many sprawling vacant lots that would benefit from farming."

What do these numbers mean? The author is clearly trying to make a point, but to me, it's a bit confused and even somewhat contradictory. Almost 90 percent of Chicago residents don't live in a census tract with a community garden. But how big are census tracts? If a census tract is only a few square blocks, you could be quite close to a community garden and not get counted. Perhaps a better measure would be living in a tract adjacent to a tract with a community garden, or within two tracts. From the article, it is unclear. (For what it's worth, I looked it up, and it looks like my neighborhood, which is about 1.65 square miles, has 14 census tracts in it. My census tract does not have a community garden in it, but at least one adjacent tract does, and I think I'm a four-minute walk from that garden.)

The article continues, "nearly half of these tracts have a poverty rate above the city average of 21 percent." Is that good or bad? Put another way, "more than half of these tracts have a poverty rate at or below the city average of 21 percent." That sounds like a different story. But beyond the "nearly half" vs "more than half" issue, how should we assume poverty is distributed in the city? Do the tracts have very similar populations, or do affluent areas have more census tracts per capita? Overall, how many tracts have above- and below-average poverty? I honestly don't know what we should assume about this distribution, but on first reading, it doesn't sound too bad for about half of the census tracts to have above-average poverty. It sounds about as bad as "half of our students are below average," a fairly meaningless but generally true statement. Furthermore, in a sample of 2.4 million out of 2.7 million citizens, we would expect the statistics to be very close to the statistics for the city as a whole; only a large deviation from those numbers would be remarkable. Without information about the percentage and location of high-poverty census tracts in the city in general, we are unable to make a meaningful comparison of the areas with urban gardens to those without.

Doing some research for this post, it became clear to me that the author took these numbers almost word for word from the research paper (sorry, it's Elsevier, and there's a paywall) he mentioned, which includes the figures somewhat in passing and does not editorialize about the south and west sides benefitting from urban farming. The paper is about using Google Earth to track urban farming and get a more accurate idea of the numbers and types of urban gardens in Chicago. Why does the author of the Chicago Reader piece feel the need to quote these statistics? Clearly, using numbers seems to give the argument more credibility, and his readers may well respond to numbers this way.

This article is not an isolated incident. Statistics are used and misused all over newspapers, magazines, and the Internet. And they're necessary. Without them, science papers can't accurately describe the size of an effect or the probability that it was due purely to chance, and reporters can't let people know what a new study means. How can we, as bloggers, reporters, and editors, increase the quality of statistics reporting in the media? And what should the media consumer look out for when reading these stories?

If you're going to ScienceOnline, I cordially invite you to come talk about statistics with us. We'll be talking about our statistics reporting pet peeves, how to write about statistics responsibly without boring our readers, and resources for those of us who would like a refresher course in what all those numbers in science papers mean. We'll also talk about some of the biggest stories in statistics from the past year and where the media got statistics right and wrong.

Whether you'll be at the session or not, feel free to share your public statistics pet peeves, resources, and requests for resources in the comments. You can follow along with our session on Twitter on Thursday. We'll be using the hashtag #PublicStats. The hashtag for the (un)conference itself is #scio13.

Finally, if you have access to some data about the distribution of poverty in Chicago census tracts, I would love to learn about it!