2012 United States presidential election results by county, on a color spectrum from Democratic blue to Republican red. (Credit: Mark Newman, Department of Physics and Center for the Study of Complex Systems, University of Michigan)

“Clinton crushes Biden in hypothetical 2016 matchup: Poll.” This was the headline of a MSNBC article on July 17, a full two years before the election in question. In the fine print, NBC reported that the margin of error was around 2 to 5 percent, which would appear to be small enough to trust the findings. But should we trust that Hillary Clinton is certain to win the nomination?

270ToWin.com already has an entire list of matchups pitting Clinton against all the potential Republican candidates, and it has Clinton winning in almost every one, but that does not necessarily mean she’ll be president in three years. The key thing to understand is that the margin of error does not always describe the true error inherent in the poll, so polls that boast a small error can end up being completely wrong.

The concept of polling rests on the assumption that the opinions of the people sampled in the poll accurately represent the distribution of opinions across the entire population, which can never be completely true. The “margin of error” describes the uncertainty that comes from having such a small sample size relative to the size of the population. In general, the more people are surveyed, the smaller the margin of error. But this doesn’t take into account another key source of error called “biased sampling”. The fact that a poll samples a lot of people does not mean that it does so in the truly random fashion that would be needed to extrapolate results to the larger population. Unfortunately, many polls fall victim to a number of biases that significantly skew their results despite their small margin of error.

The most common bias, known as convenience sampling, occurs when pollsters select people to survey using a convenient, but not entirely random, strategy. A well-known historical example is the 1936 Roosevelt-Landon presidential election. The Literary Digest touted its polls as the most accurate because they sampled a large number of people and the margin of error was less than 1 percent. Their polls all concluded that Landon was sure to win, which didn’t happen. According to Wayne Journell and P. Holt Wilson, in their article Lies, Damn Lies, and Statistics: Uncovering the Truth Behind Polling Data, it turns out that the magazine found people to poll using car registration and phone numbers at a time where only the wealthier people had phones and cars. Because these wealthier people tended to vote Republican, this drastically altered the results of its polls.

The Democratic and Republican presidential candidates of 2012. (Credit: DonkeyHotey via Wikimedia Commons)

The 2012 race between Barack Obama and Mitt Romney provided a more recent example of convenience-sampling bias. Obama eventually won 51.1 percent of the popular vote. However, the final polls from Rasmussen Reports had Romney favored to win in most states. Why did it predict the wrong candidate to win? There could be lots of different reasons, but one hypothesis is convenience-sampling bias. Rasmussen Reports mostly finds its sample group through landline phones, which many people no longer use. For those who do not have landline phones, Rasmussen uses an online survey. There are a couple of problems with this methodology. First, this means the company can only reach people who have landline phones or Internet access. Yet, according to Nate Silver, the founder and editor of FiveThirtyEight, 23 percent of adults do not have a landline, 4 percent don’t answer their landline and 2 percent don’t have a phone at all. So Rasmussen’s method could definitely bias the poll towards the wealthier and older segments of the population that still uses landlines, both of which tend to vote Republican.

Another possible source of polling error is known as volunteer bias, wherein the people who volunteer their opinions to a poll do not represent the distribution of the entire population. For example, among the group that has landline phones, some percentage respond to a polling call by simply hanging up, an outcome made socially easier by the practice of polling through an automated message, as Rasmussen does. The people who take the time to listen to the automated message and respond accordingly are those who strongly feel that their opinion must be heard. According to Charles Seife, in his book Proofiness: How You’re Being Fooled by the Numbers, for presidential elections with an incumbent, the people who are very vocal about their opinions are typically those who would vote against the incumbent, because they are generally unhappy with the status quo and feel that there needs to be a change. Those who would vote for the incumbent tend to be happier with the state of the country and therefore do not feel as strongly that their opinions need to be heard.

Rasmussen’s use of an online survey further exacerbates volunteer bias. It takes time and energy to fill out an online survey, instead of just giving answers to someone on the phone, decreasing the chance that people who are undecided or only leaning slightly one way will respond. It is also less personal, making it easier for people to ignore it. These reservations about the online survey can produce large amounts of error in the poll.

The time at which polls are conducted also affects their margin of error. Nate Silver, in his book The Signal and the Noise, reported that the accuracy of polls drastically increases the election gets closer. A poll a year before the election was only 59 percent accurate for a candidate with a 5-point lead, but 95 percent accurate the day before the election. This timing bias, if you will, was clearly on the display in the 2012 presidential election, so it pays to be skeptical about the polls that are coming out now making predictions for 2016.

After the midterm elections today, more polls will be conducted and more data will be spewed onto national news networks. To successfully sift through the massive amounts of data, we must keep in mind the error inherent in all polls due to sample size and methodology. Polling data is rarely perfect and often inconclusive or misleading, so its pays to pay attention to the details.