May 23, 2011

The Data Are In Regarding Satoshi Kanazawa

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

A Hard Look at Last Week's "Objective Attractiveness" Analysis in Psychology Today

If what I say is wrong (because it is illogical or lacks credible scientific evidence), then it is my problem. If what I say offends you, it is your problem."—Satoshi Kanazawa

Satoshi Kanazawa has a problem.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

It is hard to believe that it was merely a week ago today that I first encountered Satoshi Kanazawa; given all that I have read, thought and talked about him this week, it feels like a year. For those of you who haven't been following this saga online, or aren't regular readers of Psychology Today: last Sunday, Satoshi Kanazawa, PhD, Evolutionary Biologist and professor at London School of Economics posed (and purported to answer) an incendiary question on his Psychology Todayblog: "Why Are Black Women Less Physically Attractive Than Other Women?"

Though the post has been removed from the site, you can now see it here. In the post, Kanazawa promises his readers a scientific analysis of public data showing objective evidence of Black women's status as the least attractive group among all humans. In other words, he promises to wave a magic wand, say "Factor Analysis!" and make racist conclusions appear before your (bluest) eyes.

As it turns out, Kanazawa is a repeat offender, with years of roundly criticized and heartily debunked pseudoscience-based shock-jockery under his belt. Despite this, he is still posting on the blog of a reputable mainstream publication, still teaching at a respected university and still serving on the editorial board of one of his discipline's peer-reviewed research journals. Though, possibly not for long: this particular post's racist hypothesis offended many, unleashing serious righteous outrage across the internet: social media users raced to blog, tweet and even petition demanding that Psychology Today remove Kanazawa as a contributor to their Web site and magazine. Psychology Todayremoved the post late Sunday night, and Monday morning the largest student organization in London (representing 120,000 students) unanimously called for Kanazawa's dismissal.

Over the past week, a handful of Kanazawa's fellow bloggers at Psychology Today have posted insightful and at times scientifically-grounded critiques of his research question and methodology. Dr. Scott Barry Kaufman has even done an independent statistical analysis of the data set Kanazawa uses to "prove" his theory, beating me to publication by a couple of days but coming to the same conclusions I have derived from my own independent analysis.

Independent evaluation of an article's data analysis is a critical step in deconstructing scientific inquiry, and one the mainstream media rarely undertakes. As the founder of a science journalism nonprofit – and therefore an aspiring entrant into the mainstream media ranks – I am alarmed by this. Whether we agree with Kanazawa's assertion or are horrified by it, we cannot report on it without actually comparing his hypothesis to the evidence. Yet, as the London Guardian warned us back in 2005:

...[s]tatistics are what causes the most fear for reporters, and so they are usually just edited out, with interesting consequences. Because science isn't about something being true or not true: that's a humanities graduate parody. It's about the error bar, statistical significance, it's about how reliable and valid the experiment was, it's about coming to a verdict, about a hypothesis, on the back of lots of bits of evidence.

In his blog post, non-journalist Kaufman [and his co-author on the post, Jelte Wicherts, who also wrote up a much more complete, technical analysis of the dataset here] did a reporter's job, explaining why Kanazawa's statistical analysis was bunk, independently analyzing the Add Health data set (freely available here or here for anyone to analyze!) to find that Kanazawa's conclusion that Black women are the least attractive was incorrect, even if you buy into his idea that the Add Health data set was a reasonable sample from which to ground such an assessment. See Kanazawa's graph, which is magical thinking in the guise of factor analysis:

and Kaufman's graph, which makes sense:

Like Kaufman, I take great issue with Kanazawa's use of a study on adolescent health and behavior to explain human attractiveness or lack thereof. The Add Health Study begins tracking its study participants at the age of twelve, and Kaufman wisely limits his analysis to that including participants who could reasonably be considered adults.

I am disturbed by the fact that the Add Health study's adult researchers even answered the question of how attractive they rated these youth. I am even more deeply disturbed by the idea that we are to extrapolate a general theory of desirability from these adult interviewers' subjective assessment of the children's attractiveness. Kaufman's analysis may be correct, but having run the analysis as well, I feel even more strongly that this data set is a completely inappropriate basis for Kanazawa's analysis.

Brian Hughes, of The Science Bit, agrees. Hughes' critique focuses on the lack of race and sex data of the interviewers, as well as the ambiguity around the number of interviewers used – it is a worthwhile read. Hughes also points out that the Add Health data set fails to report the race of the interviewer, or any facts about the interviewer at all. For example, there is no data to analyze to help us determine if interviewers preferred interviewees of their same race.

As Robert Kurzban comments in his Psychology Todayblog retort to Kanazawa, "Rhodes et al. (2005) argued that if people prefer faces that constitute an average of the faces that they experience, then, as they put it, faces 'should be more attractive when their component faces come from a familiar, own-race population.' They indeed showed some evidence for an 'own race' effect." Hence, in knowing the race of the interviewer and the interviewee, we might actually be able to learn whether this held true and add to the body of scholarly knowledge.

Kaufman and other bloggers also address Kanazawa's painful contortion of factor analysis, which I agree is laughable. He looks at three measurements of the same test taken at three different time points and creates a one-factor model, with the one factor being "objective attractiveness." This is, of course, founded on the principle that an attractiveness rating handed out by interviewers in a study on adolescent health and well-being is actually measuring something that we can agree is "objective attractiveness."

He then says that by merging these three measurements for each interviewee into one factor, he can use factor analysis to get at that "objective attractiveness" while minimizing any error. This is just plain false. Factor analysis cannot get rid of measurement error. If it could, we'd all be using it all the time, and we'd get rid of all measurement error, and scientific studies wouldn't need to be replicated.

What his factor analysis might be saying is that over time, individuals were rated relatively consistently by interviewers on what the study called attractiveness. Without knowing anything about the interviewers, we have no idea whether this is significant. The beauty – and danger – of factor analysis is that the statistician running the analysis gets to define the factors, and there are an infinite number of factor solutions to any given problem - or at least, no unique solutions.

Kanazawa continues by looking at the attractiveness mean values for women by racial group, also as measured by the interviewer, and, seeing a difference in the overall attractiveness rating as broken down by these arbitrary racial groups (which somehow fail to include "Hispanic," despite all other study data including that category), concludes that since there are differences between groups, then the reason for that difference in the rating of attractiveness by interviewers over time is due to race.

But that is a logical fallacy. We have no idea why the interviewers felt differently about different youth in the study – correlation is not causation. In fact, according to Kaufman's reading of the data, correlation might not even really be correlation:

The low convergence of ratings finding suggests that in this very large and representative dataset, beauty is mostly in the eye of the beholder. What we are looking at here are simple ratings of attractiveness by interviewers whose tastes differ rather strongly. For instance, one interviewer (no. 153) rated 32 women as looking "about average," while another interviewer (no. 237) found almost all 18 women he rated to be "unattractive."

Kanazawa also correlates Black female self-perception of attractiveness as being higher than Black female rated attractiveness, despite there being no one-to-one relationship between self-identification of race and perceived race. The two could be completely different: for example, I could self-identify as Hispanic but my interviewer, seeing my dark skin, might perceive me as Black. Hence, Kanazawa's conclusions are nonsensical.

Kanazawa surmises that Black women's lower attractiveness might be due to low estrogen and high testosterone. Yet, high estrogen levels and low testosterone is a leading cause of fibroids, which significantly impact Black women, especially Black women who are overweight. Also, Black women have been found to have higher levels of estrogen in a study on breast cancer. Finally, Kanazawa offended his fellow Psychology Today bloggers in 2008 with his post, "The power of female choice: Fat chicks get laid more." The thesis there contradicts his supporting theory here. It leads me to wonder if this is all just some grand practical joke.

I see a more central flaw with Kanazawa's method beyond its creepiness, reliance on unscientific conjecture or abuse of factor analysis. Since the interviewers' assessment data was never intended to be used for an analysis such as Kanazawa's, the survey was not designed to capture that information. In fact, nowhere in the study monograph, nowhere on the website and nowhere in the study design materials is the interviewer's assessment of the interviewee's attractiveness mentioned. (I emailed the study designers to ask why they collected this information in the first place, and will update this post below if they answer.)

Why was the study undertaken? According to the study website, it was in response to a mandate by the US Congress inthe NIH Revitalization Act of 1993, where Congress asked a division of the NIH to "provide information about the health and well-being of adolescents in our country and about the behaviors that promote adolescent health or that put health at risk" with "a focus on how communities influenced the health of adolescents."

The Add Health study measures hundreds of variables. One has to wonder: why pick only race? Especially when the results of your "study" are so unabashedly weak? Seeing that Kanazawa based his findings on such a tenuously related study, I wonder how many other studies he scoured for evidence to support his point. This sort of "fishing" for results to support your finding leads to bad science, period.

I agree with Psychology Today blogger, Sam Sommers, PhD, of Tufts University, when he concludes:

Like it or not, the burden is higher when you're a scientist blogging about science. And anyone who can only think of one explanation for an observed difference in a data set might simply be incapable of meeting that high burden.

To quote Kanazawa, a little bit of logic goes a long way. Seeing that his work is rife with logical errors, Kanazawa should be criticizing himself.

I drafted this post after spending a couple of days sorting through my emotions on Kanazawa's work. Seeing that the man clearly relishes his role as an agent provocateur, I knew I could not impact him or those who respond to his work from a place of emotion. He has made that much clear.

From my incessant reading of blog responses and comments, I have encountered the sentiment that because Kanazawa's question was immoral to ask, his results are invalid. I agree with my heart and soul that the way he framed his so-called "research question" is offensive, racist and harmful. As I tweeted after reading Kanazawa's post, "Imagine a little Black girl reading this filth. [Toni Morrison's novel] The Bluest Eye is not history to her. It's reality." I want to protect that little girl – and wish I could heal all the little girls that came before her and grew up into beautiful women like this one, made to feel ugly by a racist society. I stand in solidarity with Black women and hope you will heed this blog's cry to stand stronger than ever in self-love.

The intent behind a question can establish an immoral line of inquiry and instigate immoral research methods (see the Nazi doctors' experiments). But a question itself is not evil. Scandalous, offensive and sometimes frightening questions are often at the root of important scientific inquiry. When supported by data significant enough to support them, these questions drive us toward the truth (see, e.g., "the Earth is round").

I agree with Psychology Today blogger Mikhail Lyubansky, PhD, when he says, "[e]xtraordinary claims ... require extraordinary evidence and editorial oversight." This does not lead us to censorship; it means requiring that an inquiry bring us closer to – not farther from – the truth. Kanazawa does not earn censure with the political incorrectness of his question, but earns social and scientific irrelevance through the weakness of his research. This irrelevance earns Kanazawa a special place in hell in today's link-driven media economy – one where no one will hear him scream. One week later, neither Kanazawa nor Psychology Today's editors has published any official defense, apology or explanation. The silence is deafening.

About the Author:Khadijah M. Britton, JD, is founder of BetterBio, a Massachusetts-registered nonprofit and fiscally sponsored project of the 501(c)(3) Fractured Atlas whose mission is to empower journalism that reinforces the intimate connection between life and science. BetterBio provides a platform for comprehensive science reporting, challenging us to ask hard questions and debunk dangerous myths while addressing our collective social responsibility. Khadijah also serves as a post-graduate research fellow in antibiotic policy under Professor Kevin Outterson at Boston University School of Law while she completes her Master's in Public Health at Boston University School of Public Health and studies for the bar exam.

The views expressed are those of the author and are not necessarily those of Scientific American.

Edit (5-26-2011): Statement from Add Health regarding Kanazawa’s blog post to Psychology Today. Carolina Population Center, University of North Carolina at Chapel Hill

May 23, 2011

On May 16, 2011, Dr. Satoshi Kanazawa, an evolutionary psychologist associated with the London School of Economics, posted a blog on the website of Psychology Today. The blog, which was written for a publication called The Scientific Fundamentalist, made a series of contentious claims including that African-American women are, on average, less attractive than women of other races. A flurry of responses ensued, and the essay was subsequently removed from the Psychology Today website. Since then, commentators and members of the public have raised concerns about the source and quality of data upon which Kanazawa based his blog post. Add Health would like to respond to these concerns.

The data Kanazawa used for his research were drawn from the National Longitudinal Study of Adolescent Health (Add Health), a congressionally-mandated study funded by the U.S. National Institutes of Health. Add Health data are available in two forms: a "public use" data set, which includes data from a subset of participants, and a "contractual" or "restricted-use" data set, which includes the full set of variables and participants. The "restricted-use" data are available to researchers who have appropriate research credentials (e.g., post-graduate degree) and an Institutional Review Board in their research institution that ensures their use of data security procedures required by Add Health to protect data and participant privacy and confidentiality. Kanazawa applied for and was granted access to these restricted data, as have thousands of other researchers. Because Add Health was congressionally mandated and funded by the National Institutes of Health, these data are a public resource. Add Health has sought to make its data widely available to the scientific community of qualified U.S. and international researchers while stringently meeting its obligation to protect the confidentiality of its participants. Add Health does not stipulate what research topics can or cannot be studied and does not censor research findings. As do other studies, Add Health relies on the scientific peer-review process to evaluate the merits of any given analysis of project data.

Regarding the merits of Kanazawa’s research, we note that this was not a peer-reviewed research article, but a blog. Kanazawa based his blog post on data derived from interviewer ratings of the respondents that were recorded confidentially after the interview was completed and the interviewer had left the interview setting. It is a widely-used and accepted survey practice for interviewers and researchers to include such post-survey completion remarks. These remarks provide both an additional observation about the respondent and data on the context of the interview for researchers to assess data quality. In this instance, Kanazawa chose to present interviewer ratings of respondent attractiveness, one component of interviewer post-survey remarks. Because Kanazawa chose to report his results in a blog, his methods and analysis were not subject to the mainstream peer review process that evaluates the scientific quality of research and determines the merit of the work. Because the methods that would be presented in peer-reviewed research are not included in the blog, it is not possible for other individuals to evaluate the soundness of his methods. However, the subject matter – perceptions of others’ attractiveness – has been studied for decades in diverse fields such as social psychology, sociology, economics, and public health. Add Health chose to include these items – among others in the remarks section – for several reasons:

Interviewer ratings of respondent attractiveness represent a subjective "societal" perception of the respondent’s attractiveness. We included these items because there is a long line of research evidence that indicates that perceived attractiveness is related to important health and social outcomes, including access to health care, health education and instruction, job search, promotions, academic achievement, and social success in friendship and marriage. For example, males who are rated more highly attractive tend to have higher wages, shorter periods of unemployment, and greater success in the job market*. In Add Health, we measure respondents’ self-perceptions and in the case of interviewer ratings, others’ perceptions. Despite one’s own perception of one’s intelligence, identity and appearance, often societal perceptions matter as well, and matter in ways that research needs to understand to inform policies to prevent discrimination, unequal access to resources, and social inequality.

Because the interviewer’s perception is subjective, researchers need to account for the characteristics and life experiences of the interviewer in interpreting their ratings. A wealth of research on perceived attractiveness (that is, as perceived by others, not oneself) has shown that such ratings vary according to the characteristics of the rater. For example, a male interviewer might rate a female’s attractiveness according to different criteria than a female interviewer rating the same female’s attractiveness. Other interviewer characteristics that are important to take into account are age, race, ethnicity, education, geographic location, and life experiences, in general. Notably, several characteristics of the interviewers are available in the restricted use Add Health dataset at Waves 3 and 4. It is these data (e.g., interviewer age, sex, race, ethnicity, education) that might more usefully inform an analysis undertaken to investigate the role of other-perceived versus self-perceived attractiveness on some outcome of interest (employment, health, etc).

In response to Kanazawa’s blog, Add Health publically addressed whether this was a valid scientific interpretation of the data. Below, we post comments from National Public Radio’s interview with Dr. Kathleen Mullan Harris:

"The director of the Add Health project, Kathleen Mullan Harris, contradicted Kanazawa on the nature of her project's research in a telephone interview Tuesday. The longitudinal study, funded by the federal National Institutes of Health, also asked interviewers to describe their subjects' behavior during interviews, ethnicity, and other characteristics.

‘He's mischaracterizing the objectiveness of the data — that's wrong. It's subjective. The interviewers' data is subjective,’ said Harris, who is also a professor of sociology at the University of North Carolina at Chapel Hill.

‘The empirical analysis does not account for the characteristics of the interviewers, which influence their observation,’ Harris said, listing such elements as race, ethnicity, sex, education and life experiences."

Add Health is the largest, most comprehensive longitudinal study of adolescents and young adults ever undertaken in the United States. Add Health began with an in-school questionnaire administered to a nationally representative sample of more than 20,000 students in grades 7-12 in 1994-95. The study followed the cohort into young adulthood with four in-home interviews, the most recent in 2008, when respondents were aged 24-32. Add Health combines extensive longitudinal survey data on respondents’ social, economic, psychological and physical well-being with rich contextual data on their families, neighborhoods, communities, schools, friendships, peer groups, and romantic relationships. This provides unique opportunities for studying how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood. There are more than 8,000 Add Health data users who have published thousands of peer-reviewed research articles, many of which have informed public health programs and policies to improve the health and well-being of young people in America. More information about Add Health can be found at www.cpc.unc.edu/addhealth.

Dr. Kathleen Mullan Harris,

Director and Principal Investigator of Add Health

_____________________________________________________________________________________

* Eagly, A. H., Ashmore, R. D., Makhijani, M. G. & Longo, L. C. (1991). What is beautiful is good, but. . .: A meta-analytic review of research on the physical attractiveness stereotype. Psychological Bulletin, 110(1), 109–128.

French, M. T., P. K., Homer, J. F., & Tapsell, L. M. (2009). Effects of physical attractiveness, personality, and grooming on academic performance in high school. Labour Economics,16(4), 373-382. DOI: 10.1016/j.labeco.2009.01.001

Hamermesh, D. S., & Biddle, J. E. (1994). Beauty and the labor market. The American Economic Review, 84(5), 1174-1194 http://www2.econ.iastate.edu/classes/econ321/orazem/hamermesh_beauty.pdf

Hosoda, M., Stone-Romero, E. F., & Coats, G. (2003). The effects of physical attractiveness on job-related outcomes: A meta-analysis of experimental studies. Personnel Psychology, 56(2), 431–462.