September 18, 2019

Don’t Trust Scientists? Then Help Collect the Data

Citizen science has the potential to reduce data dishonesty

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

In 2015 I was on the verge of publishing my first scientific journal article. The culmination of hundreds of hours spent filming defensive behavior in snakes seemed to be paying off in a big way: an exciting new conclusion about how the rattlesnake's namesake rattle evolved. But there was a problem.

While almost every data point I collected about viper behavior supported our hypothesis—that snakes more closely related to rattlesnakes shake their tails more quickly—one critical species bucked the trend: the cottonmouth. These large venomous snakes from the Southeastern U.S. shook their tails a measly 10 or 15 times per second—half as quickly as most other rattlesnake cousins.

Staring at my computer screen after analyzing the videos, I realized two things. One, cottonmouths were going to complicate an otherwise straightforward story that would reduce the strength of my conclusions, meaning I might not get my paper published in a top-tier journal. And two, since I was the only person in my lab analyzing this data, it was completely within my power to fudge the numbers.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Modern scientists operate under massive pressure to publish consistently groundbreaking work in prestigious scientific journals. Like me, most of them also do their work with very few people looking over their shoulder, especially during data collection. This dynamic creates a situation where, for many scientists, being dishonest by falsifying data is both potentially lucrative for one's career and surprisingly easy to do. And while most scientists are honest, data falsification does happen.

A new approach to research, however, called "citizen science" is changing this dynamic by allowing members of the general public to collect and analyze scientific data. By spreading responsibility for data collection across dozens, hundreds or thousands of different people, and by making this data accessible throughout the research process, citizen science has the potential to stamp out many aspects of data dishonesty in fields that use this kind of data.

While I didn't end up changing the cottonmouth data (you can see them standing out like a sore thumb (tail?) in Figure 4b in my paper), it struck me at the time how much power was wrapped up in just a few numbers on a screen. If I had changed just five data points on a spreadsheet containing hundreds I could have drastically improved my paper's chance of acceptance into a premier journal. It's an extraordinary fact, and not a single person would have known. And while I chose not to falsify my data, this kind of thing does happen.

Five years ago, Michael LaCour, a graduate student from the University of California, Los Angeles, published a paper in Science, perhaps the most prestigious scientific journal in the world. The paper made headlines around the country for its remarkable conclusion: in-person interactions with members of the LGBTQ community can lead to long-term changes in people's attitudes about marriage equality.

The paper was a huge step forward for LaCour's field, and, perhaps as a result, he was soon offered a teaching position at Princeton. But just a few months after the paper was published, an in-depth review of LaCour's methodology revealed that the study he described in the paper never actually took place. LaCour made up the data.

In 2015 a cancer researcher at Duke University named Anil Potti was let go after it was revealed that he had tampered with data throughout his career, leading to the redaction of nearly a dozen research papers. And in 2011 Dutch social psychologist Diedrik Stapel was found to have falsified data, leading to a probe into work during his tenure as a researcher. At the most recent count, 59 research papers have been redacted—nearly every article published by Stapel across a decade-long career. And these are just the people that happened to get caught.

Data falsification is such a difficult problem in science because in most research labs there are only a handful of researchers, usually graduate students, working together on collecting and analyzing data. Often this task is relegated to a single person. Combine this with the fact that prestigious journal publications are the currency of academia, and that researchers often invest months and years in a project that might never produce useful data, and you have a recipe for problems with fake science.

While there are many measures in place to evaluate the veracity of research, including age-old processes such as peer review and newer initiatives such as publicly available datasets, all of these stopgaps are downstream of the actual data collection process. Built into the tradition of academic independence is a less positive tradition of "taking scientists' word for it."

When you read a journal article you are free to criticize a researcher's conclusions, statistical techniques or methodology. But you cannot disagree with the actual data that was collected. In order to verify this step, you would have to replicate the experiment—a tradition oft-lamented for its near total absence from modern academia.

So what if we changed the fundamental way that data is collected? Citizen science presents not only an opportunity to engage the public with science and to collect otherwise inaccessible data. It is also an opportunity for a paradigm shift in the way we collect data. This change would limit the capacity for scientists to falsify data.

Take the citizen science project "iNaturalist," for instance. This platform allows anyone to submit geo-located photographs of wildlife seen anywhere in the world to a publicly accessible online database. The project has exploded in popularity, and just last year hit 15 million different observations of wildlife from nearly every country on earth. Scientists have already used this massive dataset to publish hundreds of new research papers on topics such as migration changes, species declines and the distributions of different colors and morphologies.

Because the observations used in these papers were generated by thousands of strangers on the internet, it would be nearly impossible for a researcher to coerce the data collectors into submitting data that fits a particular agenda.

Of course, using citizen science data introduces its own set of unique problems inherent in a dataset created and maintained by nonprofessionals. For instance, if volunteers are inadequately trained or data is not vetted effectively, data collected by citizen scientists may be of low quality. But if done responsibly, citizen science has the capacity to produce high-quality data wholly outside the influence of any one researcher. And with so many people working on data collection at the same time, the datasets themselves can be larger and more comprehensive than a scientist could ever gather on his or her own.

But embracing citizen science is not only good for the quality of the science being done. With public trust in scientists lagging behind many other professional occupations and scientifically derived conclusions about topics such as climate change and evolution still at odds with the views of the American public, scientists have a lot to do to win over the trust of the people who, by and large, fund most of the research conducted in the country. Citizen science might be a step toward rebuilding this trust, by putting the power of research and science in the hands of everyday people.