Is Big Data going to revolutionize science and help us make a better world? Not based on what it's done so far.
Let me back up a moment. I was recently a speaker at How the Light Gets In, a groovy philosophy and music festival in Hay-on-Wye, Britain. The festival lodged me in a fantastical mansion called Great Brampton House, where I hung out with other festival speakers, like physicists George Ellis, Carlo Rovelli, Carlos Frenk and Tara Shears; biologist Rupert Sheldrake; psychiatrist David Nutt; and journalists Colin Tudge and David Malone. (I hope to post Q&As with Ellis and Sheldrake soon.)
One afternoon, I participated in a public debate about Big Data with journalists Kenneth Cukier and Angela Saini and sociologist Laurie Taylor. The festival brochure blurbed our session as follows: "In an age when we can collect information in unimaginable quantities, will we replace simplifying theories with complex real patterns? Might Big Data be the end of theory?" These are questions posed by Cukier, data editor for The Economist, and Viktor Mayer-Schonberger, professor of Internet governance at Oxford, in their 2013 bestseller Big Data: A Revolution That Will Transform How We Live, Work, and Think.
In an essay based on their book, they write: "Big data starts with the fact that there is a lot more information floating around these days than ever before, and it is being put to extraordinary new uses. Big data is distinct from the Internet, although the Web makes it much easier to collect and share data. Big data is about more than just communication: the idea is that we can learn from a large body of information things that we could not comprehend when we used only smaller amounts."
Their most intriguing assertion is that Big Data will allow us to solve problems without necessarily understanding them. Big Data will shift the emphasis of researchers from "causation to correlation," Cukier and Mayer-Schonberger write. "This represents a move away from always trying to understand the deeper reasons behind how the world works to simply learning about an association among phenomena and using that to get things done." Former WIRED editor Chris Anderson made similar claims in his 2008 essay "The End of Theory."
If Big Data means digital technologies, I love Big Data. Digital technologies have transformed the way journalists as well as scientists gather, analyze and disseminate information. With my MacBook Air, I can Google Cukier without leaving my room and in an instant find reviews of his book—including a surprisingly positive one by often-cranky Michiko Kakutani of The New York Times.
Moreover, Cukier is right that science can achieve a lot merely by uncovering correlations. Epidemiological studies demonstrated more than a half century ago a strong correlation between smoking and cancer. We still don't understand exactly how smoking causes cancer. The discovery of the correlation nonetheless led to anti-smoking campaigns, which have arguably done more to reduce cancer rates over the past few decades than all our advances in testing and treatment (as I point out in a recent post).
I'll also grant Cukier's point that theory can impede problem-solving. Let's say, for example, you are a judge pondering whether a convicted murderer might kill again. You could ask a psychiatrist or other so-called mind-expert to make a prediction based on the expert's pet psychological paradigm. But you're much better off using the method that insurance companies employ to calculate rates for policy-holders; that is, just look at recidivism rates of criminals with backgrounds like that of your murderer.
The enthusiasm of Cukier and others for Big Data nonetheless irks me, for several reasons. First, their rhetoric reminds me of the hype generated by the fields of chaos and its successor, complexity, which in my 1996 book The End of Science I lumped together under the term "chaoplexity." Both fields promised that with faster computers and more sophisticated software, scientists could solve problems that had resisted analysis by stodgy old reductionist methods. Some chaoplexologists hoped to discover profound new principles governing the "self-organization" of a wide range of complex phenomena—and possibly even an "anti-entropy" force.
These discoveries never happened, and neither have the kinds of practical advances envisioned by Cukier and Schonberger. Take genetics. The Human Genome Project was completed in 2003 in less time and for less money than had been expected because of advances in computers and other technologies. The costs of extracting and analyzing genetic data from humans and other organisms has continued to plummet.
But all this progress has produced disappointingly few medical advances. At this writing, not a single gene therapy has been approved for commercial sale in the U.S.; only one has been approved in Europe. The war on cancer has been a bust, as has the effort to find specific genes underpinning complex behavioral traits and disorders.
Just as geneticists are drowning in data, so are neuroscientists. In spite of the increasing power of scanners and other tools, neuroscientists still can't explain exactly how brains make minds, or why our minds often work so badly. Thomas Insel, director of the National Institute of Mental Health, recently advocated overhauling our methods of defining and diagnosing schizophrenia, depression and other mental illnesses. Our treatments for these illnesses also remain appallingly primitive.
The economic crash of 2008 provides another reality check for Big Data. Wall Streeters have the fastest computers, most sophisticated software and biggest databases money can buy, and yet many failed to see the 2008 crash coming. The hope that Big Data will make economics and other social sciences truly scientific—that is, precise and predictive--remains, for now, a fantasy.
I assume—I hope—that our ever-improving information technologies will one day yield truly revolutionary advances in medicine, social sciences and other fields. But until that day arrives, let's keep a lid on the hype about Big Data.
Further Reading: Are “Big Data” Sucking Scientific Talent into Big Business?