July 8, 2011

The pros & cons of Amazon Mechanical Turk for scientific surveys

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

The days of making a scientific inference about the human psyche from the results of a questionnaire given to 50 undergraduates are over. The online labor market facilitated by Amazon Mechanical Turk, founded in 2005 (three centuries after the infamous Turkish automaton that could play chess was invented), is now being used to wide effect for scientific inquiry. If you are interested in using this tool, I would recommend reviewing the nascent body of literature, watching Harvard post-doc David Rand’s talk at the Berkman Center, and, of course, reading on.

Amazon Mechanical Turk was initially used for hard-for-machines-to-handle tasks such as categorizing information, and translating audio to text. Amazon uses the Turk platform for its iPhone app where potential shoppers take a photo of a product they see and want and receive a link to that product on Amazon. Computers are not that sophisticated (yet), just like the 18th century Turk could not really play chess. Instead, behind that Amazon link is a human being voluntarily matching that iPhone image with an Amazon product in exchange for pennies. Academics are now taking advantage of Turk, and, from my own experience with the difficulties of recruiting students to experiments, I suspect Turk's use will only increase.

In the old days, psychology departments would trade undergraduates surveys for course credit or some other incentive. Then came the Internet and a lot of survey research moved online. Now, with Amazon Mechanical Turk, it seems the days of begging students to participate in surveys is over, as noted when a colleague of mine was about to present at a conference and only 6 graduate students had completed her survey. I told her about Turk, and by the next morning she had an additional 30 samples (and only because 30 is where she set her limit). This is one of the biggest pros of Turk: recruitment is entirely painless. Furthermore, you can exclude certain demographic profiles from taking the survey. Using their in-house survey platform is easiest, but scientists have successfully recruited workers to an external site.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

On top of that, Turk labor is cheap. The average wage is $1.40 per hour (of course, you’ll get better work the more you pay). The cost of labor is a bit uncomfortable and I shiver at the idea of a warehouse of enslaved Amazon Turkers, but so far the participation in Turk appears entirely voluntarily. A requester (someone who asks for labor) can also refuse to pay if work is inaccurate, although workers obviously don't like this (and can reciprocate by giving requesters negative press in the forums).

Turk also overcomes some of the concerns about the usual undergrad demographic. UBC psychologist Joe Henrich and colleagues have pointed out that Western, educated, industrialized, rich and democratic (WEIRD) cultures, and particularly American undergraduates, can display psychologically unusual behavior, especially compared to the other 88% of the world’s population. Turk is subject to similar biases because the service, so far, is only available in English and to make job requests you have to have a U.S. address. However, there is a growing number of Indian workers, and the profile of U.S. subjects tends to be closer to the broader U.S. population than university students.

As with any survey, there is always the question whether people are paying attention. In the lab, it is easy to watch people (which means that answers can also differ due to the feeling of being watched). The recommendation is to plant questions in the survey that can be used to score attention and validate results. Paolacci et al. (2010) used as an example: “While watching the television, have you ever had a fatal heart attack?” If a worker answered 'always' or 'sometimes', they would discard the survey. Evidence suggests the rate of failing attention on Turk is no higher than other formats (e.g., lab, other internet survey).

So far, some indicators suggest Turk is a trustworthy source. Rand (2011) used IP address logging to verify subjects’ self-reported country of residence, and found that 97% of responses are accurate. He also compared the consistency of a range of demographic variables reported by the same subjects across two different studies, and found between 81% and 98% agreement, depending on the variable.

I wonder whether, as Turk grows, the wage will increase or decrease. Can we expect a Turk union? Will Amazon eventually instate a minimum wage? Will the system be as reliant? What will stop workers from opening multiple accounts with multiple different profiles so that they can maximize earnings? I also have questions about how Amazon Mechanical Turk can handle multi-player experimental games and how we can ensure players are not seeing or talking to one another.

For this reason, labs for human research, where the variables are easier to control, will remain necessary, for all sorts of reasons like 3+ participant interactions and experiments where the rewards are not monetary. But when it comes to research done with surveys that were traditionally administered in the lab or over a lab's website, Turk is a great improvement and a promising tool. We can expect to see 'Amazon Mechanical Turk' as keywords in many academic papers to come.

References:

G. Paolacci, J. Chandler, & P.G. Ipeirotis. (2010) Running experiments on Amazon Mechanlical Turk. Judgment and Decision Making 5(5): 411-419. [available here]

D.G. Rand (2011) The promise of Mechanical Turk: How online labor markets can help theorists run behavioral experiments. Journal of Theoretical Biology. [available here]