May 25, 2011

The Politics of the Null Hypothesis

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

To what degree these and other differences originate in biology must be determined by research, not fatwa. History tells us that how much we want to believe a proposition is not a reliable guide as to whether it is true. --Steven Pinker, commenting on Lawrence Summers in the The New Republic

In late April, Dr. Angela Lee Duckworth and her team published a study demonstrating that some of the variability in IQ test results--and in the life outcomes known to be correlated with IQ scores--varied significantly and substantially as a function of how motivated the test subject was. As the author herself points out in the paper, this is a fairly humdrum result. Those who developed IQ testing predicted that this would happen:

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Despite efforts to "encourage in order that every one may do his best" on intelligence tests (ref. 41, p. 122), pioneers in intelligence testing took seriously the possibility that test takers might not, in fact, exert maximal effort. Thorndike, for instance, pointed out that although "all our measurements assume that the individual in question tries as hard as he can to make as high a score as possible . . . we rarely know the relation of any person’s effort to his maximum possible effort" (ref. 42, p. 228). Likewise, Wechsler recognized that intelligence is not all that intelligence tests test: "from 30% to 50% of the total factorial variance [in intelligence test scores remains] unaccounted for . . .this residual variance is largely contributed by such factors as drive, energy, impulsiveness, etc. . . ." (ref. 9, p. 444).

Yet this study that should be eliciting simple head nods was published in PNAS and is generating a fair amount of buzz. Ed Yong covers it nicely, emphasizing both underlying ability and motivation as factors in test results and educational and employment outcomes. ScienceNOW reports the findings, and Maria Konnikova of Artful Choice notes that motivation is a factor over which society has a certain amount of control.

The study is also receiving less positive notices. Steve Sailer at VDARE says the study tell us nothing new because IQ tests are still predictive, despite the researchers' determination that a model that includes motivation predicts life outcomes better than one that doesn't. StatSquatch runs a separate analysis taking out some of the data, but declines to submit the analysis as a peer-reviewed comment on the paper. And at EconLog, Bryan Caplan also visits the motivation factor:

For example, instead of saying, "IQ tests show that people are poor because they're less intelligent - and intelligence is hard to durably raise" we should say, "IQ tests show that people are poor because they're less intelligent and less motivated - and intelligence and motivation are hard to durable raise." If, like me, you already believed in the Conscientiousness-poverty connection, that's no surprise.

The interesting thing about the disparity in views on this "non-controversial" study is how the views are divided. The straightforward reporting comes from science sites. The criticisms and assertions that the results are meaningless come from a linked group of political blogs. VDARE is an anti-immigration site; EconLog is a an economics blog. StatSquatch is perhaps most easily defined by the rate at which those on the blogroll perspire over "political correctness."

It seems that Pinker's concerns that political influences may attempt to stifle science, noted in the quote at the top of this post, have some basis in fact. However, despite the direction of his concerns, those political influences in this case aren't coming from the left, and they aren't reflecting any anti-genetic bias. In fact, the comments on some of the coverage strongly demonstrate that it is Duckworth's "position" that is presumed to be inherently flawed:

Thanks. I've been meaning to look at the individual studies in Duckworth's paper myself, because her results simply seemed implausible. She is something of an anti-IQ warrior, and has published sloppy studies before, too.
It's interesting that when a non-specialist journal publishes a paper on IQ, it's almost always something that tries to question and minimize the significance of IQ. [source]

The vast majority of papers on IQ that make it to publication outside of specialist venues are those that take a contemptuous attitude towards the topic and refuse to engage the totality of evidence. In the meantime Paul Thompson at UCLA and BGI are respectively moving forward the IQ/neuroscience and IQ/genomics fields respectively. Keep pretending that doesn’t exist. [source]

Nothing about the field of IQ studies is free of political influence. It's naive to believe that any kind of research on a purported measure of individual merit could be politics-free in a self-proclaimed meritocracy with wide inequalities. Binet's original work was meant to determine which children should have access to additional educational resources. IQ scores are used occasionally to sort out "inappropriate" candidates for various jobs, including those whose IQs are too high for a role. IQ as a proxy for merit is used to argue that a group does or does not face discrimination in educational or career opportunities. This is all terribly political.

The question isn't whether there are politics surrounding this issue or where. They're everywhere. The question is where does the politics get in the way of the science? Again, the answers don't favor Pinker's view of a fatwa against genetic explanations of individual differences.

No one is pretending BGI Hong Kong doesn't exist or that it isn't looking for genes associated with variability in IQ scores. No one is issuing fatwas to stop them or even protesting their work. Some people are questioning IQ as a proxy for intelligence, but no one is saying the work shouldn't go forward until a better proxy is found. Similarly, no one is pretending that Paul Thompson isn't doing some fascinating work in brain imaging and variability in brain structure.

What is in dispute is the likelihood that genes will be found that account for any significant fraction of the variability found in human intelligence and whether the current literature on the topic is sufficient to predict that. Here is where disagreement with Thompson comes into play. He has published a number of papers with "genetics" in the title ("Genetic influences on brain structure," "Genetics of brain structure and intelligence," "Genetics of brain fiber architecture and intellectual performance") that involve no genetic testing whatsoever.

Instead, these studies rely on degree of relatedness (usually between identical and fraternal twins) as a measure of shared genes. This sounds reasonable, and to a degree it is. However, unless researchers can measure or control for the way genes unrelated to intelligence interact with the environment, these studies can't tell us how much variation in brain structure is due to shared genes that code for intelligence and shared genes that code for something else, such as illness that limits time in school. Until these studies are designed to look for genetic influences in addition to environmental influences, these studies are useless for their intended purpose.

The degree of shared environment is a problem for studies of twins raised apart, as well. In 2001, Jay Joseph published a critique of these studies. It noted that the term "raised apart" has very little meaning for many of these pairs of twins: separated at a late age or brief period of years or placed with relatives in the same town. He also noted what a twin study that could control for environment would look like:

Although no conclusions about genetic influences on personality differences can be drawn from the MISTRA data, a description of a valid MZA study seems in order. First, a systematic ascertainment of twins would be undertaken. In addition to Juel-Nielsen's (1965/1980) criteria that the twins be alive, reared apart from early life, and monozygotic, the twins must not have been aware of each other's existence until they are contacted by the researchers. As a way of determining whether selective placement had occurred in the sample, each twin pair's rearing-family socioeconomic status would be ranked and correlated. Once an experimental group of MZAs is collected in this manner, it would be compared with a control group of biologically unrelated pairs of strangers sharing the following characteristics: They should be the same age, they should be the same sex, they should be the same ethnicity, the correlation of their rearing environment socioeconomic status should be similar to that of the MZA group, they should be similar in appearance and attractiveness as determined by blinded raters, and the degree of similarity of their cultural backgrounds should be equal to that of the MZA twins. Finally, they should have no contact with each other until after they are evaluated and tested. These controls will constitute the unrelated group. Of course, it is not possible for unrelated pairs to share a common prenatal environment, which twins do share.

Cosma Shalizi, a professor of statistics who deals with the statistics of complex networks, also explains why the current literature is insufficient to support claims that genetic differences underlie observed differences in IQ scores. The article itself is technical, but the conclusions are fairly simple to follow:

1. The most common formulae used to estimate heritability are wrong, either for trivial mathematical reasons (such as the upward bias in the difference between monozygotic and dizygotic twins' correlations), or for substantive ones (the covariance of monozygotic twins raised apart neglects shared environments other than the family, such as maternal and community effects)

2. The best estimate I can find puts the narrow heritability of IQ at around 0.34 and the broad heritability at 0.48.

3. Even this estimate neglected heteroskedasticity, gene-environment interactions, gene-environment covariance, the existence of shared environment beyond the family, and the possibility that the samples being used are not representative of the broader population.

4. Now that people are finally beginning to model gene-environment interactions, even in very crude ways, they find it matters a lot. Recall that Turkheimer et al. found a heritability which rose monotonically with socioeconomic status, starting around zero at low status and going up to around 0.8 at high status. Even this is probably an over-estimate, since it neglected maternal effects and other shared non-familial environment, correlations between variance components, etc. Under such circumstances, talking about "the" heritability of IQ is nonsense. Actual geneticists have been saying as much since Dobzhansky at least.

5. Applying the usual heritability estimators to traits which are shaped at least in part by cultural transmission, a.k.a. traditions, is very apt to confuse tradition with genetics. The usual twin studies do not solve this problem. Studies which could don't seem to have been done.

6. Heritability is completely irrelevant to malleability or plasticity; every possible combination of high and low heritability, and high and low malleability, is not only logically possible but also observed.

7. Randomized experiments, natural experiments and the Flynn Effect all show what competent regressions also suggest, namely that IQ is, indeed, responsive to purely environmental interventions.

Despite the fact that these studies do not and cannot tell us that there is a genetic component to the variation in IQ, we still see genetic triumphalism like this 2009 article in The Economist.

Human geneticists have reached a private crisis of conscience, and it will become public knowledge in 2010. The crisis has depressing health implications and alarming political ones. In a nutshell: the new genetics will reveal much less than hoped about how to cure disease, and much more than feared about human evolution and inequality, including genetic differences between classes, ethnicities and races.
[…]

The trouble is, the resequencing data will reveal much more about human evolutionary history and ethnic differences than they will about disease genes. Once enough DNA is analysed around the world, science will have a panoramic view of human genetic variation across races, ethnicities and regions. We will start reconstructing a detailed family tree that links all living humans, discovering many surprises about mis-attributed paternity and covert mating between classes, castes, regions and ethnicities.

We will also identify the many genes that create physical and mental differences across populations, and we will be able to estimate when those genes arose. Some of those differences probably occurred very recently, within recorded history. Gregory Cochran and Henry Harpending argued in "The 10,000 Year Explosion" that some human groups experienced a vastly accelerated rate of evolutionary change within the past few thousand years, benefiting from the new genetic diversity created within far larger populations, and in response to the new survival, social and reproductive challenges of agriculture, cities, divisions of labour and social classes. Others did not experience these changes until the past few hundred years when they were subject to contact, colonisation and, all too often, extermination.

Well, it's now 2011, and we have yet to see any crises. We've yet to see any replicated gene-intelligence associations. We've yet to see anyone step up to do, or even fund, the kinds of studies (aside from direct genetic testing) that would be required to differentiate genetic effects from environmental effects. We've yet to see the kind of doubt in these researchers that leads to studies that test hypotheses rather than reinforce conclusions. We've yet to see any lessening of the certainty that these much-sought genes are just around this next corner...oh...well, then, the corner after that.

What we have seen is yet one more study that shows a significant environmental element that accounts for some of the variability in IQ scores. And here is where we've seen political condemnation--of the researcher, of inclusion of data, of publication practices, of the real-world (political) significance of the results. Here we've seen presumptions about what is and what is not a "plausible" mechanism to explain variation in IQ testing, and the interpretation of scientific results through the lens of those presumptions. Despite a long history of fruitful investigation into environmental effects, here is where we find the burden of proof being placed such that the tiniest criticism is sufficient for many to dismiss this study and all its implications.

Pinker asks, "Why are empirical questions about how the mind works so weighted down with political and moral and emotional baggage?" I'm not sure I can answer Pinker's question, but the current research landscape and the reaction this research receives in the larger world suggest that maybe he's been asking the wrong people.

References Cited:

Duckworth AL, Quinn PD, Lynam DR, Loeber R, & Stouthamer-Loeber M (2011). Role of test motivation in intelligence testing. Proceedings of the National Academy of Sciences of the United States of America, 108 (19), 7716-20 PMID: 21518867

Joseph, J. (2001). Separated Twins and the Genetics of Personality Differences: A Critique The American Journal of Psychology, 114 (1) DOI: 10.2307/1423378

Chiang, M., Barysheva, M., Shattuck, D., Lee, A., Madsen, S., Avedissian, C., Klunder, A., Toga, A., McMahon, K., de Zubicaray, G., Wright, M., Srivastava, A., Balov, N., & Thompson, P. (2009). Genetics of Brain Fiber Architecture and Intellectual Performance Journal of Neuroscience, 29 (7), 2212-2224 DOI: 10.1523/JNEUROSCI.4184-08.2009

Thompson, P., Cannon, T., Narr, K., van Erp, T., Poutanen, V., Huttunen, M., Lönnqvist, J., Standertskjöld-Nordenstam, C., Kaprio, J., Khaledy, M., Dail, R., Zoumalan, C., & Toga, A. (2001). Genetic influences on brain structure Nature Neuroscience, 4 (12), 1253-1258 DOI: 10.1038/nn758

Toga, A., & Thompson, P. (2005). GENETICS OF BRAIN STRUCTURE AND INTELLIGENCE Annual Review of Neuroscience, 28 (1), 1-23 DOI: 10.1146/annurev.neuro.28.061604.135655

About the author: Stephanie Zvan is a science fiction and fantasy writer with a career-stunting dedication to reality. She blogs at Almost Diamonds about whatever strikes her fancy, but her fancy is often struck by the necessary and uncomfortable interesection of science and politics. She also finds it difficult to resist the lure of arguments, particularly those that continually restart from the same points.

The views expressed are those of the author and are not necessarily those of Scientific American.