The Data Journalism Handbook

Edited by Jonathan Gray, Lucy Chambers, Liliana Bounegru

Publisher (Paper version): O'Reilly Media

Released: May 2012

Pages: 120

Or available for free download at

In this blog post, Julian Champkin, editor of Significance, the outreach publication of the Royal Statistical Society and the American Statistical Association, reviews the newly-released The Data Journalism Handbook and goes a step further, reminding us that the world of journalism is undergoing a revolution. To be part of the revolution, Champkin argues that young and early-career science writers should not shy away from new data-heavy sources or new tools of investigation. Instead, he believes that they should explore and experiment with data to come up with strong, relevant and exceptional science journalism.


Once upon a time, journalism worked like this. A reporter travelled to a distant scene. He telephoned his words to an office. For a front-page story a thousand words would be a lot—call it 8000 bits or 1 Kilobyte of information content in the language of today’s information theory.  In the office a copytaker typed those words onto compressed woodpulp. A sub-editor would cut them about so that they would fit into the space available. Then lead-antimony was melted down, generally on the ground floor of the same building, and squeezed into tiny letter-shaped moulds in cast-iron machines operated by incredibly skilled craftsmen who (in Britain) were generally on strike, and at about three o’clock in the afternoon huge presses would start to rumble and shake the entire building as the printing of the next day’s newspapers began.  Pre-history? That was how it was when I began in London’s Fleet Street, about 25 years ago.

Now, the world is more complex than it was, and more digital than it was. Hot metal gave way to electronic word processing and nobody really mourned its passing.  A revolution just as huge is happening now.

One of the biggest news stories of the past year has been the content and outfall from the Wikileaks release of US diplomatic cables.  A single source, the unfortunate Bradley Manning, downloaded a database—of what it would be trivialising to call a deluge—of information. 251,287 documents were involved, totalling 261 million words; it is a staggering number. It is about a gigabyte of Shannon information content.

Suppose as a journalist you are faced with that mountain of virtual documents. What on earth are you to do with it? 99.99% of the documents will in all probability be dross and dull as ditchwater. How do you find the ones in ten thousands—which still amounts to a big number—that will make news? How could anyone make sense of it all? It is the job of a journalist to make sense of things.  But surely, this is the job of an analyst, an information processor, a data miner, a statistician, an IT expert, a programmer, a code-writer, right? Actually all are skills the new journalist, or journalism team, will need.

Journalism, if you hadn’t noticed, is changing. Some bits of it are actually becoming better. Those bits are to do with numeracy and with numbers. Even a few years ago very few journalists, even in reputable newspapers and broadcasters such as the BBC, knew what do with any story involving even basic numeracy, let alone databases. Now there is a small but growing band of journalists who can handle numbers. A larger band are becoming aware that just because an official or a press release quotes some numbers, those numbers might not mean what the official or the press release is claiming—they can be fudged or partial or cherry-picked, they can be comparing like with unlike, and so on. And there is another small but growing band of journalists—Hans Rosling, David McCandless are among them—who realise that hidden inside huge databases of numbers are wonderful stories about the world—human stories that are interesting and important and exciting, that are just waiting to be extracted from the numbers. It needs analysis to uncover those stories; it needs clever ways to explain those stories: new ways of drawing graphs, of visualising data, for starters, but those stories are there and so are those ways of explaining them.

All this is called data journalism.

Data journalism is such an apparently new field that I have not heard of it being taught in journalism courses. A few pioneers have worked out for themselves how to do it and have penetrated newsrooms and broadcasting offices to practice it; with digitization, and with public access to data, it has been quietly and steadily growing. And the best newspapers and the best broadcasters are doing it, and will be doing it more and more. This book is the first guide to it.

It grew out of a conference that was held at the end of 2011. It has been edited by pioneers of the genre, and contributed to by those who have practiced it, more than 70 of them from places as diverse as Japan and Finland, Nigeria and the US,  and who work for news outlets such the New York Times, Zeit Online, the BBC and the Guardian. At this stage of the game the handbook is no more than an outline of what can be possible, But that in itself is immensely valuable. And it does have practical guidelines for those who want to join the field.

So what is data journalism and how is it different form the other kind? The book gives several definitions, from several different contributors. “Perhaps it is the new possibilities that open up when you combine the traditional ‘nose for news’ and ability to tell a compelling story, with the sheer scale and range of digital information now available” is one.

“It is about connections” is another, as witnessed Steve Doig in 1993. He joined two different datasets from Hurricane Andrew: one mapping the level of destruction caused by the hurricane and one showing wind speeds. This allowed him to pinpoint areas where weakened building codes and poor construction practices contributed to the impact of the disaster. He won a Pulitzer Prize for the story. And the example continues: “Today news stories are flowing in as they happen, from multiple sources, eye-witnesses, blogs and what has happened is filtered through a vast network of social connections, being ranked, commented and more often than not: ignored. This is why data journalism is so important. Gathering, filtering and visualizing what is happening beyond what the eye can see has a growing value.”

In a profession (if that is what journalism is) under siege, new ways of working are needed to achieve new things.   There is a question-mark over the future of print.  Digital is clearly the way forward; but digital journalism is not just for the web.  Its stories appear in—sometime dominate—the traditional media and the traditional news agenda.

Journalism, they say, has been democratised. In a world where anyone can blog, with no filter for accuracy or truth, and their blog can be read instantly and taken as fact by millions, it has been claimed  that everyone is now a journalist.

But Data Journalism restores skill—or rather skills—to the job. Because it needs skill in analysing the vast amounts of data now routinely available to find the story that is lurking within it. And when you have done that, it needs skill in telling that story, to the readers—or viewers, or listeners: print, audio, videos, diagrams, graphics, visualisations interactive or not, can all be all involved. Take another example quoted in the book:

“The Las Vegas Sun in 2010 ran a Do No Harm series on hospital care. They analyzed more than 2.9 million hospital billing records, which revealed more than 3,600 preventable injuries, infections and surgical mistakes. They obtained data through a public records request and identified more than 300 cases in which patients died because of mistakes that could have been prevented. Their presentation contained, among other elements, an interactive graphic which allows the reader to see by hospital, where surgical injuries happened more often than would be expected; a map with a timeline that shows infections spreading hospital by hospital; and another interactive graphic that allows users to sort data by preventable injuries or by hospital to see where people are getting hurt. The Nevada legislature responded with six pieces of legislation.”

Clearly the days of a single-medium report are vanishing; as also witnessed this website. Scientific American began as a print magazine; this site is now an integral part of its operations.

So the mediums of reporting are changing. Information, or data—you can call it either—has changed already. Before, there was not much of it: the star journalist was the one who found that extra nugget which gave the story. Now the problem can be put as Too Much Information: the star journalist is the one who can disregard the information that is meaningless and tie together all the vast amount of the rest to find the story. As the book puts it, journalism now is about processing. Which consists of analysing and presenting.

One thing has not changed: the definition of journalism. It is finding out what is happening, and telling people what is happening, in a timely way and in a clear way that engages them.  The old war reporters of old; the crime reporter racing to the phone box to get his story in (did they ever really say “Hold the Front page”?);  the door-stepping hack and in recent years the devalued mobile phone hacker—they were all trying to get the story out. And data journalists have the same aim.  The source is different, the methods are different, the aim, at its best, is the same.

This book is a first. There will be more like it. But read this one, because it is happening now.

Julian Champkin