April 25, 2011 | 1
Collecting all digital data on people could yield key insights into our nature, but violate privacy.
In "Too Hard for Science?" I interview scientists about ideas they would love to explore that they don’t think could be investigated. For instance, they might involve machines beyond the realm of possibility, such as particle accelerators as big as the sun, or they might be completely unethical, such as lethal experiments involving people. This feature aims to look at the impossible dreams, the seemingly intractable problems in science. However, the question mark at the end of "Too Hard for Science?" suggests that nothing might be impossible.
The scientist: Duncan Watts, a principal research scientist at Yahoo! Research, where he directs the Human Social Dynamics group. Before joining Yahoo!, he was a full professor of sociology at Columbia University. He is the author, most recently, of "Everything is Obvious: Once You Know the Answer."
The idea: "Imagine if all the data from Facebook, Google, Yahoo!, Foursquare, Twitter, and GroupOn were combined," Watts says. "Now imagine if all that data were combined with all the location data, call and SMS records for all cellular phones. In fact, imagine that everyone had smart phones, and that all the app usage data were also combined. Then imagine combining all that data with data from shoppers club cards, retailer databases, credit agencies, voter registration records, presidential campaign contributions, real estate transactions, credit card transactions."
"All this data already exists, and new technologies and businesses are constantly coming into existence that generate and store more of it," he continues. "Currently, however, it is all fragmented — any individual’s data might be hosted by hundreds or even thousands of different entities, so nobody, not even Facebook, can see more than a sliver of any one person’s digital footprint. But what if we could put all these fragments together to form a coherent whole? In theory at least, this could be done, creating a veritable panopticon of the digital age."
For social science, this would be a huge breakthrough, he notes. "Already, social and computer scientists are busy mining mountains of digital data, much of it derived from email, social networking sites, cell phone data, online games, e-commerce sites, et cetera. But there’s a big problem — because each source of data is collected and stored separately from any other source, it’s generally impossible to connect any one ‘mode’ of activity, such as who you’re friends with, to any other mode, such as what you spend your money on."
"Why does this matter?" he asks. "Let’s say we’d like to measure to what extent friends influence each others’ purchase behavior. It’s a relatively simple question to ask, and one that is of great interest both to social scientists and also to marketers. But to answer it we’d need to be able to observe both the complete friendship network — already a difficult task — and also everyone’s shopping behavior.
Using current systems, one might obtain an approximation of the friendship network by using Facebook data, or mining email logs, while e-commerce sites or retailer databases may show how much individuals are spending on particular products. But at the moment, it’s extremely difficult to combine even two such sources of data, and of course there are many different modes of communication, and many different places to make purchases."
"Generalizing from this example, you can see how social scientists might learn an enormous amount about human behavior that is currently mysterious, simply by simultaneously observing individual interactions and behaviors, for very large populations, over extended periods of time," Watts says. "Viewed this way, the digital panopticon, or even limited versions of it, might revolutionize social science the way that the telescope revolutionized physics."
The problem: "The very idea of a digital panopticon probably freaks most people out, and rightly so," Watts says. "The original panopticon, remember, was intended to be a prison, designed by the English philosopher Jeremy Bentham. According to Wikipedia, ‘The concept of the design is to allow an observer to observe, opticon, all, pan, prisoners without the incarcerated being able to tell whether they are being watched, thereby conveying what one architect has called the sentiment of an invisible omniscience.’ Ouch."
"Obviously privacy is a huge issue for all industries that collect digital information about their consumers, and will probably become ever more so as more and more user data gets collected by various parties," Watts says. "For exactly the same reason that the panopticon would be so powerful a scientific tool — namely that it would put all the pieces together — it raises far more serious questions about individual privacy than anything currently in existence or even in the realm of near-term possibility."
In addition, there might be concerns about what governments might do with such data. "There have been instances where government agencies have requested data from individual providers, but the panopticon idea would require an immensely greater effort, both in scale, that is, volume of data, and also scope, that is, integration of data from different sources," Watts says.
"My feeling is that this would be hard to pull off in practice in a country like the U.S., both technically and also politically," he notes. "Regardless, whether feasible or not, it’s reasonable to worry about it, if only to keep it in the unfeasible realm."
The solution? "In its extreme version, it’s hard to imagine how we could ever trust any one entity to access all the world’s data," Watts says. "I’m not going to say there’s no socially acceptable solution, but I can’t see it happening."
"That said, more limited but still scientifically useful versions might be both possible and also acceptable from a privacy standpoint," he notes. "Even those will have to be designed with great care, and many questions remain unresolved. But I’m optimistic that if we proceed with caution and sensitivity, there is a lot of interesting science that can be done in socially responsible manner."
If you have a scientist you would like to recommend I question, or you are a scientist with an idea you think might be too hard for science, email me at firstname.lastname@example.org
Follow Too Hard for Science? on Twitter by keeping track of the #2hard4sci hashtag.
About the Author: Charles Q. Choi is a frequent contributor to Scientific American. His work has also appeared in The New York Times, Science, Nature, Wired, and LiveScience, among others. In his spare time he has traveled to all seven continents. Follow him on Twitter @cqchoi.
The views expressed are those of the author and are not necessarily those of Scientific American.