Skip to main content

When the Big Lie Meets Big Data

The same techniques used to identify and motivate likely voters can also be used to spread false information

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American


Joseph Goebbels said “If you tell a lie big enough and keep repeating it, people will eventually come to believe it.” In the era of big data, however, numerous smaller lies, guided by machine learning, may be more effective than a few big lies. And even the literal truth seasoned with innuendo will do. Studies show that the salient concept in a statement can persist and dominate the literal truth in which it is embedded. For example, consider the statement, “There is no evidence that Hillary Clinton ran a child sex slavery operation out of Comet Pizza.” Repeat it often enough, and people are prone to remember just the “Hillary Clinton… child sex slavery” concept, not its negation.

President Trump’s disregard for the truth often seems impulsive, and not strategic—a bit like that of a braggart in a bar (were it not for the fact that he is a teetotaler). Reacting angrily to news reports that his inauguration crowd was not as big as that of President Obama when he first came into office, or even as big as the protesting crowd gathered by the Women’s March, Trump insisted (and had his press secretary insist) that his inaugural crowds were the biggest in U.S. history (not true). During the campaign, he claimed to have seen people in Jersey City celebrating the 9/11 attack by dancing in the street (also not true). After the election, he claimed that millions of votes by illegal immigrants cost him the popular vote (no evidence).

President Trump’s enthusiastic embrace of casual lying is partly seen as a reflection and a product of the general phenomenon of “fake news.” The impact of fake news is real—Comet Pizza, a restaurant in Washington DC, was indeed believed by many to be the site of a child sex trafficking ring that was operating with the involvement of Hillary Clinton. It was the scene of a bizarre siege on Dec. 4 when Edgar Welch, heavily armed, entered the restaurant and announced that he was there to investigate matters. The sex-slave story was pure fiction, but it was all over the internet. Edgar Welch believed it, and traveled to DC from North Carolina, ready to do battle.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Where does fake news come from? It turns out there is a profitable cottage industry in creating and purveying fake news. NPR tracked down one fake-news creator, Jestin Coler, in a Los Angeles suburb. Coler started out catering to the alt-right, and found a rapidly expanding audience during the recent presidential campaign—“a huge Facebook group of kind of rabid Trump supporters just waiting to eat up this red meat.”

Jestin Coler is a Democrat, and says he wants to expose the phenomenon of fake news. But the money is good (over $100,000 per year), and, in the end, what he wanted people to do was to click on his stories so he could collect advertising revenue.

But suppose you were interested in getting someone to do something more than click? Something political—like vote for a particular candidate, go to a rally, write your Congressman, etc.? That’s where Big Data comes into the picture.

Since 2004, political consultants have used big-data models to predict how people will vote, and indicate whether they should be sent messages to encourage them to do so (and if so, which messages). They use randomized experiments (A-B tests) to determine the effect of different messages at the individual level, and correlate this with other variables, such as demographic data and voting data, to build predictive models. All this is similar to what happens in the marketing realm (e.g. should a given consumer be sent solicitation A or B), and President Obama was a pioneer in the use of predictive analytics to target individual voters.

The science of predictive modeling has come a long way since 2004. Statisticians now build “personality” models and tie them into other predictor variables. Edgar Welch can now be targeted for messaging not simply on the basis of his demographic voting behavior, but on the basis of a personality classification derived from reams of detailed personal data available for purchase. One such model bears the acronym “OCEAN,” standing for the personality characteristics (and their opposites) of openness, conscientiousness, extroversion, agreeableness, and neuroticism. Using Big Data at the individual level, machine learning methods might classify a person as, for example, “closed, introverted, neurotic, not agreeable, and conscientious.”

Alexander Nix, CEO of Cambridge Analytica (owned by Trump’s chief donor, Rebekah Mercer), says he has thousands of data points on you, and every other voter: what you buy or borrow, where you live, what you subscribe to, what you post on social media, etc. At a recent Concordia Summit, using the example of gun rights, Nix described how messages will be crafted to appeal specifically to you, based on your personality profile. Are you highly neurotic and conscientious? Nix suggests the image of a sinister gloved hand reaching through a broken window.

In his presentation, Nix noted that the goal is to induce behavior, not communicate ideas. So where does truth fit in? Johan Ugander, Assistant Professor of Management Science at Stanford, suggests that, for Nix and Cambridge Analytica, it doesn’t. In counseling the hypothetical owner of a private beach how to keep people off his property, Nix eschews the merely factual “Private Beach” sign, advocating instead a lie: “Sharks sighted.” Ugander, in his critique, cautions all data scientists against “building tools for unscrupulous targeting.”

The warning is needed, but may be too late. What Nix described in his presentation involved carefully crafted messages aimed at his target personalities. His messages pulled subtly on various psychological strings to manipulate us, and they obeyed no boundary of truth, but they required humans to create them.  The next phase will be the gradual replacement of human “craftsmanship” with machine learning algorithms that can supply targeted voters with a steady stream of content (from whatever source, true or false) designed to elicit desired behavior. Cognizant of the Pandora’s box that data scientists have opened, the scholarly journal Big Data has issued a call for papers for a future issue devoted to “Computational Propaganda.”

Hopefully, it will address broader ethical and policy issues, and not be a “how to” manual.

Peter Bruce founded The Institute for Statistics Education at Statistics.com in 2002. He is a co-author of "Data Mining for Business Analytics" (Wiley), and "Practical Statistics for Data Scientists: 50 Essential Concepts" (O'Reilly,2017), the author of "Introductory Statistics and Analytics: A Resampling Perspective" (Wiley), and the co-developer of Resampling Stats software.

More by Peter Bruce