ADVERTISEMENT
  About the SA Blog Network













Roots of Unity

Roots of Unity


Mathematics: learning it, doing it, celebrating it.
Roots of Unity Home

How Should We Write about Statistics in Public?

The views expressed are those of the author and are not necessarily those of Scientific American.


Email   PrintPrint



A member of the order Lepidoptera enjoys my favorite green space in Chicago, Garfield Park Conservatory. There's a 50 percent chance that this is a below-average lepidopteran. Image: Evelyn Lamb.

I am exited to be attending ScienceOnline in Raleigh, North Carolina later this week. And I’m even more excited to be co-moderating two sessions! One of them, at noon on Thursday, will be about Public StatisticsHilda Bastian, my partner in crime, has written a cartoon introduction to our session, and I’ve been trying to think of what to write here about it. There have been a lot of statistics in the news this year, from Nate Silver to the “five-sigma” discovery of a Higgs-like particle to every health story ever. Where to start?

Last week I was flipping through the Chicago Reader over breakfast one morning and came upon the article “A greener Chicago would be a safer Chicago.” In my sleepy morning state, my eyes glossed over the page a bit, but they latched onto a paragraph with several numbers in it. Numbers are important and objective (right?), so the part with the most numbers in it must make a clear, convincing argument for the author’s main point.

Before I share and critique this excerpt, please know that I love community gardens, and I think it would be good if there were more of them. The thesis of this article is that urban vegetation provides many benefits to a community, including lower crime rates. I am not arguing for or against this position; I am stepping back and thinking about the way statistics are used in this paragraph and whether we should take them as supporting evidence for the article’s conclusion. I also don’t intend to insult or malign the author. I don’t think he is stupid or dishonest, and the online version of the article does provide links to summaries of some of the studies he cites, which can help readers evaluate the claims themselves. I just think he might not have turned a skeptical eye to the statistics he quoted in the article and how they might be interpreted.

Without further ado, here’s the paragraph that jumped out at me:

“A recent mapping of gardens [in Chicago] by University of Illinois researchers showed that the vast majority of Chicago residents—2.4 million out of 2.7 million—live in census tracts with no community gardens; that nearly half of these tracts have a poverty rate above the city average of 21 percent; and that most of these low-income tracts are on the south and west sides. These are areas with many sprawling vacant lots that would benefit from farming.”

What do these numbers mean? The author is clearly trying to make a point, but to me, it’s a bit confused and even somewhat contradictory. Almost 90 percent of Chicago residents don’t live in a census tract with a community garden. But how big are census tracts? If a census tract is only a few square blocks, you could be quite close to a community garden and not get counted. Perhaps a better measure would be living in a tract adjacent to a tract with a community garden, or within two tracts. From the article, it is unclear. (For what it’s worth, I looked it up, and it looks like my neighborhood, which is about 1.65 square miles, has 14 census tracts in it. My census tract does not have a community garden in it, but at least one adjacent tract does, and I think I’m a four-minute walk from that garden.)

The article continues, “nearly half of these tracts have a poverty rate above the city average of 21 percent.” Is that good or bad? Put another way, “more than half of these tracts have a poverty rate at or below the city average of 21 percent.” That sounds like a different story. But beyond the “nearly half” vs “more than half” issue, how should we assume poverty is distributed in the city? Do the tracts have very similar populations, or do affluent areas have more census tracts per capita? Overall, how many tracts have above- and below-average poverty? I honestly don’t know what we should assume about this distribution, but on first reading, it doesn’t sound too bad for about half of the census tracts to have above-average poverty. It sounds about as bad as “half of our students are below average,” a fairly meaningless but generally true statement. Furthermore, in a sample of 2.4 million out of 2.7 million citizens, we would expect the statistics to be very close to the statistics for the city as a whole; only a large deviation from those numbers would be remarkable. Without information about the percentage and location of high-poverty census tracts in the city in general, we are unable to make a meaningful comparison of the areas with urban gardens to those without.

Doing some research for this post, it became clear to me that the author took these numbers almost word for word from the research paper (sorry, it’s Elsevier, and there’s a paywall) he mentioned, which includes the figures somewhat in passing and does not editorialize about the south and west sides benefitting from urban farming. The paper is about using Google Earth to track urban farming and get a more accurate idea of the numbers and types of urban gardens in Chicago. Why does the author of the Chicago Reader piece feel the need to quote these statistics? Clearly, using numbers seems to give the argument more credibility, and his readers may well respond to numbers this way.

This article is not an isolated incident. Statistics are used and misused all over newspapers, magazines, and the Internet. And they’re necessary. Without them, science papers can’t accurately describe the size of an effect or the probability that it was due purely to chance, and reporters can’t let people know what a new study means. How can we, as bloggers, reporters, and editors, increase the quality of statistics reporting in the media? And what should the media consumer look out for when reading these stories?

If you’re going to ScienceOnline, I cordially invite you to come talk about statistics with us. We’ll be talking about our statistics reporting pet peeves, how to write about statistics responsibly without boring our readers, and resources for those of us who would like a refresher course in what all those numbers in science papers mean. We’ll also talk about some of the biggest stories in statistics from the past year and where the media got statistics right and wrong.

Whether you’ll be at the session or not, feel free to share your public statistics pet peeves, resources, and requests for resources in the comments. You can follow along with our session on Twitter on Thursday. We’ll be using the hashtag #PublicStats. The hashtag for the (un)conference itself is #scio13.

Finally, if you have access to some data about the distribution of poverty in Chicago census tracts, I would love to learn about it!

Evelyn Lamb About the Author: Evelyn Lamb is a postdoc at the University of Utah. She writes about mathematics and other cool stuff. Follow on Twitter @evelynjlamb.

The views expressed are those of the author and are not necessarily those of Scientific American.





Rights & Permissions

Comments 11 Comments

Add Comment
  1. 1. ebenari 10:46 pm 01/27/2013

    One good resource for journalists is News and Numbers: A Writer’s Guide to Statistics, by Victor Cohn and Lewis Cope.
    I hope you don’t mind if I also plug a somewhat more narrowly focused article by a colleague of mine, which does a great job of explaining the statistics used in studies on cancer screening (here: http://www.cancer.gov/ncicancerbulletin/112712/page4). A lack of understanding of these statistics, including by many physicians, is often at the root of controversies such as those on the benefits and harms of mammography and PSA testing.

    Link to this
  2. 2. MedicalQuack 1:01 am 01/28/2013

    Good article and accurate information is a problem for sure. I remind people to watch this video all the time, It’s all about context by Charlie Siefe from NYU and he’s a mathematician and does a good job brining this message to the layman, so much that it is one of my five educations videos on the left side of my blog. You have to love “the fish was dead”.

    http://ducknetweb.blogspot.com/2012/01/context-is-everythingmore-about-dark.html

    I should also put a link in here to one of the hottest information compiling software I have seen. This guy should make a mint off his patent as it can prepare a study and more…I do health care but this is right in there with the capture and formatting what’s on the web for sure. I have no clue what the cost is but anyone writing studies or reports should take a look and I will be doing the same shortly. He created 800,000 books and put them on Amazon for sale with his software. How much better can formatting be to have the software write it in word and of course one could always proofread which I think anyone would before tossing it out there, and the inventor of course knew his own material so why he was able to crank so many books I think without a ton of proofreading maybe needed.

    http://ducknetweb.blogspot.com/2013/01/programmer-creates-800000-books-using.html

    Link to this
  3. 3. randoo 1:07 am 01/28/2013

    I think it’s up to the reader to entertain critically all statistics published in popular media. They are most often designed to lead the reader to a particular conclusion; they are usually arguments rather than bald data, and often fallacious at that. I see the post hoc fallacy so often in published statistics that I’m confused whether it’s meant to be humorous or merely misleading! In the article you write about here, they seem unnecessary. Why not simply state the number of community gardens, and include a simple map showing their geographical locations? This could be correlated with another map showing income levels or whatever other data set you might want to compare. Let the reader decide whether the data supports the thesis. Advice from a voracious reader who knows nothing in particular about statistics: keep it graphical and refrain from including your conclusions in the numbers. A sound argument will stand on its own.

    Link to this
  4. 4. priddseren 1:18 am 01/28/2013

    My profession requires the use of statistics. I use them, write them, write computer models to do all kinds of what if scenarios, predictive analysis and a variety of other uses but statistics are not fact or truth.

    The problem with science or medicine and their use of statistics is done in lieu of real science. For publications such as SA or nature to be of any use they must ensure the understanding that statistics are interesting, help provide information for further study and even give some preliminary guesses these results are not in anyway facts or proof of anything real.

    The Higgs boson “discover” is a good example. It has not been discovered. At best, statistics are showing they have found something interesting at 2 of the predicted energies for Higgs. That is about all. The statistics cant even completely rule out it being an anomaly or error, just renders it very unlikely but one thing the statistics do not prove is what was found. Other information, such as the threory behind Higgs gives solid math to predict what was found is most likely Higgs but again, it is not proven. We only know something is most likely there and is most likely not an error and probably the Higgs.

    For Medicine its Clinical studies. “Clinically Proven” is synonymous with we don’t know anything. We gave the medicine to some people, they didn’t die and here are all the side effects. We think through statistics the desired effect has occurred but we have no way to prove how the medicine worked or if it was in fact the medicine.

    With real science, such as Einstein proving light bends using a real eclipse and photos of real stars being measured, no need for Einstein to come up with various sigma definitions because anyone can look at the old plates of photos and measure the stars.

    We know for example, Penicillin binds to an enzyme necessary for bacteria to divide, preventing that division and eventually rupturing the bacteria. It is a proven fact not a statistical 5 sigma belief it isn’t an error.

    As long as science continues to push statistics as if they are fact because the same scientists want grant money and politicians refuse to wait for real results, we will continue to have this problem of statistics being used in place of facts.

    Link to this
  5. 5. FrenchToaster 6:16 am 01/28/2013

    The author has selected a very important subject! As statistics is employed everywhere to draw meaning from data, using statistical analysis effectively to help communicate and persuade is an immediate issue. I hope her effort goes well, and wish her the best of luck popularizing and demystifying this inferential toolkit.

    Link to this
  6. 6. jtdwyer 7:59 am 01/28/2013

    Good posting – as was your earlier, linked, post, “5 Sigma—What’s That?”

    However, the most significant issue with the reporting of the purported Higgs boson ‘discovery’ was not uncertainty about the meaning of the term ’5-sigma’ but the so common misstatements that the Higgs boson had almost certainly been discovered! In fact, the CERN experiments cannot directly detect Higgs bosons as the decay too quickly – the CERN experiments can only infer the presence of some particle that is manifested at an energy level of ~125 GeV – that decayed into two photons that were actually detected.

    An example clarification that I posted to one news article is used here for illustration:

    The CERN CMS announcement http://cms.web.cern.ch/news/observation-new-particle-mass-125-gev announcement text states:

    “CMS observes an excess of events at a mass of approximately 125 GeV with a statistical significance of five standard deviations (5 sigma) above background expectations.”

    The Fig. 5 caption states:

    “The observed probability (local p-value) that the background-only hypothesis would yield the same or more events as are seen in the CMS data, as a function of the SM Higgs boson mass for the five channels considered. The solid black line shows the combined local p-value for all channels.”

    The 5 sigma statistic applies only to the probability that the detected boson decay products were NOT just the result of background detector noise. The experimental data has been cross-checked by two ‘independent’ groups of (CERN) researchers…

    Additional information, including particle properties and how they interact with other particles, will be necessary to confirm the identity of the indirectly inferred new bosons. As mention above, the bosons themselves are so unstable that they cannot be directly detected: their presence is inferred from the detection of some of their expected decay products. As I understand, many more analyses and perhaps additional experiments will be required to more definitively identify the new particles.

    There seems to be a great deal of confusion about what the Higgs mechanism is supposed to do: it’s often claimed that, for example,
    that the Higgs Boson is “the particle that gives matter mass and holds the physical fabric of the universe together.” However, a very enlightening press release, http://cds.cern.ch/journal/CERNBulletin/2012/06/News%20Articles/1420890?ln=en explains that the Higgs mechanism was hypothesized only to explain how the W and Z boson, mediators of the weak force (interaction) acquired mass while photons (mediators of the electromagnetic force interaction) and gluons (mediators of the strong force interaction) dis not acquire mass. Critically, it then goes on to explain (conflicting with most information sources):
    “Interactions with the Higgs field are not just reserved for force-carrying particles. The theory can also explain how all other fundamental particles acquire their rest mass. But don’t make the mistake of thinking the Higgs field is responsible for all mass. Interaction with the field actually contributes less than 1 kg to the mass of an average person). Your remaining mass comes from the energy of the various forces holding your bodies together – mainly the strong force binding quarks inside nucleons, with a tiny contribution from the electromagnetic force that reigns over the atomic scale.”

    With the innumerable news reports stating unequivocally that ‘the Higgs boson explains where mass comes from and, with its almost certain discovery, we now understand all the secrets of the universe’, IMO the issues involved in explaining statistical significant to the public is only a minor issue in the misrepresentation of scientific information in news reports…

    Link to this
  7. 7. bogira 11:44 am 01/28/2013

    I appreciate the criticism, Evelyn, and your wish that we journalists would evaluate the statistics we cite, and think more carefully about how we use them.

    In my post, I briefly describe three recent studies on vegetation and crime, which suggest that the former may somewhat depress the latter. The Google mapping I then referred to was meant to show that there’s plenty of “room for growth” in the neighborhoods in which crime is a particular problem–which in Chicago are the poorer neighborhoods on the south and west sides.

    I’m sure you’re right that most readers don’t know how big a census tract is, and it would have helped to explain that parenthetically. Tracts are set by population–they usually have about 2,000-4,000 residents, but Chicago’s 866 tracts were mapped decades ago, and, for the sake of comparison over time, they haven’t been changed. A typical tract is about four blocks by two blocks. So, yes, you can be without a community garden in your census tract but still be only a few minutes’ walk from one. It’s hard to generalize from the three studies I relied on how localized any crime-depressing impact of vegetation might be.

    The “nearly half of these tracts have a poverty rate above the city average” wasn’t intended to help readers “make a meaningful comparison of the areas with urban gardens to those without.” It was intended, again, to show that there’s lots of room for more urban gardening in the poorer areas that perhaps could benefit most from it in terms of crime-reduction.

    Link to this
  8. 8. Evelyn Lamb in reply to Evelyn Lamb 11:58 am 01/28/2013

    Thanks so much for replying, bogira. I really appreciate it. That really cleared up your argument for me; it wasn’t what I assumed when reading the piece initially, but I see your intent now. And as I said, I love community gardens, particularly the ones that I get to walk by while going about my day on the south side. I hope we can get more of them. Cheers!

    Link to this
  9. 9. evelynjlamb 12:38 pm 01/28/2013

    randoo, I disagree that readers should have to sort these things out for themselves. It would be great if everyone had the time and training to do that and a healthy skepticism about numbers they read. But what is the purpose of science and health reporting if not to inform readers of new studies and their validity and importance? If there weren’t a need for an intermediary between science journal articles and the general public, we wouldn’t have jobs.

    Link to this
  10. 10. LaneG 11:52 am 01/30/2013

    It should quite simply be cause for firing or reassignment for reporters to imply a causation or effect that has not been shown by the underlying research, and for editors who let this slip. I can’t remember the formula for standard deviation by heart, but I can remember “correlation doesn’t imply causation”. It’s not hard to say, by the by, that “though A is correlated with B, it is not proven that A causes B.” Readers get it.

    It’s not really good magazines and newspapers, most of which employ journalists capable of making these distinctions. It’s the filter through to lower-end forms of journalism like talk radio and TV. I’m afraid they’re unshamable, since they live off of bite-sized “red wine means you won’t get heart disease”-level “reporting”.

    Link to this
  11. 11. Shecky R. 7:54 pm 01/30/2013

    There’s a GREAT new book out right now (I’m surprised no one has mentioned it) by Charles Wheelan called, “Naked Statistics” which is the best intro to statistics for layfolks I’ve ever seen; not at all technical, but makes very important points with real life examples, while being entertaining.

    Link to this

Add a Comment
You must sign in or register as a ScientificAmerican.com member to submit a comment.

More from Scientific American

Scientific American Holiday Sale

Black Friday/Cyber Monday Blow-Out Sale

Enter code:
HOLIDAY 2014
at checkout

Get 20% off now! >

X

Email this Article

X