ADVERTISEMENT
  About the SA Blog Network













Guilty Planet

Guilty Planet


Cooperation, conservation, and technology.
Guilty Planet Home

The Pros & Cons of Amazon Mechanical Turk for Scientific Surveys

The views expressed are those of the author and are not necessarily those of Scientific American.


Email   PrintPrint



The days of making a scientific inference about the human psyche from the results of a questionnaire given to 50 undergraduates are over. The online labor market facilitated by Amazon Mechanical Turk, founded in 2005 (three centuries after the infamous Turkish automaton that could play chess was invented), is now being used to wide effect for scientific inquiry. If you are interested in using this tool, I would recommend reviewing the nascent body of literature, watching Harvard post-doc David Rand’s talk at the Berkman Center, and, of course, reading on.

Amazon Mechanical Turk was initially used for hard-for-machines-to-handle tasks such as categorizing information, and translating audio to text. Amazon uses the Turk platform for its iPhone app where potential shoppers take a photo of a product they see and want and receive a link to that product on Amazon. Computers are not that sophisticated (yet), just like the 18th century Turk could not really play chess. Instead, behind that Amazon link is a human being voluntarily matching that iPhone image with an Amazon product in exchange for pennies. Academics are now taking advantage of Turk, and, from my own experience with the difficulties of recruiting students to experiments, I suspect Turk’s use will only increase.

In the old days, psychology departments would trade undergraduates surveys for course credit or some other incentive. Then came the Internet and a lot of survey research moved online. Now, with Amazon Mechanical Turk, it seems the days of begging students to participate in surveys is over, as noted when a colleague of mine was about to present at a conference and only 6 graduate students had completed her survey. I told her about Turk, and by the next morning she had an additional 30 samples (and only because 30 is where she set her limit). This is one of the biggest pros of Turk: recruitment is entirely painless. Furthermore, you can exclude certain demographic profiles from taking the survey. Using their in-house survey platform is easiest, but scientists have successfully recruited workers to an external site.

On top of that, Turk labor is cheap. The average wage is $1.40 per hour (of course, you’ll get better work the more you pay). The cost of labor is a bit uncomfortable and I shiver at the idea of a warehouse of enslaved Amazon Turkers, but so far the participation in Turk appears entirely voluntarily. A requester (someone who asks for labor) can also refuse to pay if work is inaccurate, although workers obviously don’t like this (and can reciprocate by giving requesters negative press in the forums).

Turk also overcomes some of the concerns about the usual undergrad demographic. UBC psychologist Joe Henrich and colleagues have pointed out that Western, educated, industrialized, rich and democratic (WEIRD) cultures, and particularly American undergraduates, can display psychologically unusual behavior, especially compared to the other 88% of the world’s population. Turk is subject to similar biases because the service, so far, is only available in English and to make job requests you have to have a U.S. address. However, there is a growing number of Indian workers, and the profile of U.S. subjects tends to be closer to the broader U.S. population than university students.

As with any survey, there is always the question whether people are paying attention. In the lab, it is easy to watch people (which means that answers can also differ due to the feeling of being watched). The recommendation is to plant questions in the survey that can be used to score attention and validate results. Paolacci et al. (2010) used as an example: “While watching the television, have you ever had a fatal heart attack?” If a worker answered ‘always’ or ‘sometimes’, they would discard the survey. Evidence suggests the rate of failing attention on Turk is no higher than other formats (e.g., lab, other internet survey).

So far, some indicators suggest Turk is a trustworthy source. Rand (2011) used IP address logging to verify subjects’ self-reported country of residence, and found that 97% of responses are accurate. He also compared the consistency of a range of demographic variables reported by the same subjects across two different studies, and found between 81% and 98% agreement, depending on the variable.

I wonder whether, as Turk grows, the wage will increase or decrease. Can we expect a Turk union? Will Amazon eventually instate a minimum wage? Will the system be as reliant? What will stop workers from opening multiple accounts with multiple different profiles so that they can maximize earnings? I also have questions about how Amazon Mechanical Turk can handle multi-player experimental games and how we can ensure players are not seeing or talking to one another.

For this reason, labs for human research, where the variables are easier to control, will remain necessary, for all sorts of reasons like 3+ participant interactions and experiments where the rewards are not monetary. But when it comes to research done with surveys that were traditionally administered in the lab or over a lab’s website, Turk is a great improvement and a promising tool. We can expect to see ‘Amazon Mechanical Turk’ as keywords in many academic papers to come.

References:
G. Paolacci, J. Chandler, & P.G. Ipeirotis. (2010) Running experiments on Amazon Mechanlical Turk. Judgment and Decision Making 5(5): 411-419. [available here]

D.G. Rand (2011) The promise of Mechanical Turk: How online labor markets can help theorists run behavioral experiments. Journal of Theoretical Biology. [available here]

Jennifer Jacquet About the Author: Jennifer Jacquet (jenniferjacquet.com) is a postdoctoral researcher at the University of British Columbia researching cooperation and the tragedy of the commons. Follow on Twitter @guiltyplanet.

The views expressed are those of the author and are not necessarily those of Scientific American.



Previous: Guilty Planet Is Resurrected. More
Guilty Planet
Next: Is Shame Necessary?





Comments 2 Comments

Add Comment
  1. 1. jakabok 5:10 am 09/19/2011

    While participation in mturk may be voluntary there are many people who do so because they need the money to pay for food and bills. It isn’t just bored, provided for housewives or other such demographics doing this. That $1.60 an hour is ridiculous, and the stinginess of academic “requestors” is a common complaint amongst turkers. I currently turk because I am unemployed, and I started off doing the various scientific surveys with fervor simply because I enjoyed the thought of helping to advance science – as a nebulous sort of term. I have been soured on the experience and feel as if I am being offered a pittance. While attention check questions help to weed out the truly careless I have yet to come across one that isn’t easy to spot if you know there’s going to be an attention check… and there usually are. So as such, it isn’t that hard to not really give very much effort to the questions. An acknowledgement that there are real people using their real time on the other end of the computer, using those real funds that are so typically sparse by providing something resembling fair compensation for their time and effort would be grand.

    Link to this
  2. 2. HitsWorthTurkingFor 2:38 am 01/5/2012

    The idea of a Turk union is an interesting concept. Turkers get paid beans now, sometimes wages as low as $2 per hour – way below minimum wage. Yet, oddly enough, AMT’s main demographic is college educated adults, many with full time jobs.

    Nevertheless, I think people deserve better, and I think many Requesters are taking advantage of us Turkers. That’s right, “us” – I am a turker as well, but I’ve decided to do something about it. I rate the quality of HITs based upon one essential quality – pay for time spent. If it pays less than $0.10 a minute, it’s a bad HIT, more than that, then it’s a good HIT.

    Right now, it’s not much, but I hope to do a lot more with it in the future. If you want to make a little extra pocket money online at a “reasonable” rate, you should check out my blog.

    http://hitsworthturkingfor.blogspot.com/

    Link to this

Add a Comment
You must sign in or register as a ScientificAmerican.com member to submit a comment.

More from Scientific American

Scientific American Holiday Sale

Give a Gift &
Get a Gift - Free!

Give a 1 year subscription as low as $14.99

Subscribe Now! >

X

Email this Article

X