The idea behind quantifying personality is deceptively simple: personality refers to predictable differences in behavior between people. Those differences should be reasonably reliable. That is, they ought to hold constant across different types of situations. Those differences should also be reasonably stable, which means they should be consistent over time.

For example, you might score high on the openness factor if you answer "yes" to questions like "I spend time reflecting on things," and you might score low on the extraversion scale if you answer "no" to questions like "I talk to a lot of different people at parties." According to personality theory, your answers to those questions shouldn't change all that much as you grow older, nor should they be different if you complete the survey at home or at the office or at a shopping mall.

Based on this definition of personality, it should be obvious that personality is not limited to humans. Indeed, animal behavior researchers are also interested in defining and quantifying personality. If measuring and describing personality is complicated for humans, it becomes vastly more so for animals. How are these individual differences in predictable responses measured, and classified? How are they even identified? What measurements should be used, and what traits do they measure? In a new paper in press in the journal Behavioural Processes, Noelle M. Watanabe and colleagues from the UCLA Departments of Ecology & Evolutionary Biology and Psychology explore these questions using an unlikely animal model: the Caribbean hermit crab (Coenobita clypeatus).

Watanabe begins by cataloging some of the inconsistencies pervading the literature when it comes to personality. For example, researchers might be interested in a trait called "boldness-shyness," and define boldness as a willingness to take risks. They might therefore use exploration tests to assess boldness. Or, they might define boldness by an animal's response to novelty, and use a test in which the animal is exposed to a new object. Other researchers, however, might use the same sorts of tests to measure "exploration-avoidance," a different trait. The problem is, it isn't necessarily the case that an animal's response to a new object relies on different psychological mechanisms than its response to a new environment. Those two types of tests, therefore, might actually be measuring the same thing. Or, an animal's willingness to explore a new environment might not reflect a risk assessment, while the animal's willingness to explore a new object might be perceived as risky. Needless to say, this sort of inconsistency in defining and measuring personality leads to confusion for researchers, especially as they try to take stock of the broader literature.

In an attempt to shed some light on these issues, the researchers measured hermit crab behavior in four different types of manipulations. First was the "flip-over" test. They measured how long it took before the crabs would re-emerge after they had been turned upside down.

Second was the "wooden box" test. They moved the crabs into a wooden box and again measured how long measure the crabs would re-emerge from their shells to explore the new environment.

Third was the "predator" test. They clamped the crabs' shells to a board in order to immobilize them in front of a computer monitor. They measured how long before the crab would re-emerge after clamping. Then, they showed a video of an approaching predator on the screen. They measured how quickly the crab would hide, how long it took to re-emerge, and how many trials it took before it would habituate to the video and stop hiding.

Fourth was the "shock" test. They moved the crabs into a Skinner box, which, like the wooden box, was an unfamiliar environment. Therefore, they measured how long it took before the crabs would re-emerge from their shells, just as in the wooden box test. After emerging, the crabs were administered a brief electric shock, which caused them to go back into hiding. This allowed the researchers to measure, once again, how long it took before the crabs would re-emerge from their shells.

Altogether, they had four assessments that they predicted would measure aspects of "exploration-avoidance," and four measurements that they thought would measure "boldness-shyness." Remember, personality theory holds that individual measurements of the same trait should correlate with eachother and should be similar over time.

First, the good news: the crabs' responses were highly consistent for each particular measurement. Just as your response to "I talk to a lot of different people at parties" would be the same today and tomorrow, the crabs' responses to these tests were equally predictable.

The crabs' responded similarly to the wooden box and to the Skinner box, as expected. Both were thought to be measurements of exploration-avoidance. Among boldness-shyness tests, they found correlations among three of the four measurements.

However, they also found correlations across predicted categories. For example, there were correlations among the crabs responses to the wooden box test, an exploration-avoidance measurement, and to re-emerge after seeing the predator, a boldness-shyness measurement. And the time to re-emerge after being clamped to the board (exploration-avoidance) was correlated with the time to hide after seeing the predator (boldness-shyness). The crabs' response to the flip-over test - assumed to be a measurement of exploration-avoidance - correlated with nothing. The list goes on.

The results seem clear: the tests used by the researchers, each of which was supposed to be associated with a given personality trait, didn't fall into line as smoothly as predicted. Case closed?

Watanabe and her colleagues used a different statistical test, called a principal component analysis, to analyze the data. This time, they got different results. This analysis found, for example, a relationship between the flip-over test and the wooden box test. This was expected, as both were supposed to measure exploration behavior, but the flip-over test did not originally correlate with anything. And both of those tests were related to the time it took the crabs to re-emerge after seeing the predator, which was ostensibly a measurement of boldness.

The researchers note that if they had only used the wooden box test, but not the predator test, to assess exploration-avoidance, they might have come to the wrong conclusions, assuming that the crabs would respond similarly in both situations. After all, the theory suggested they would. In reality, their responses to those two tests were very different. If they had just used the flip-over test and the wooden box test, and analyzed their results using a correlation statistic, they might have concluded that the tests measured different traits. But if they used only a principle component analysis, they might have concluded that they measured the same trait! The sad reality is that many studies use only one test to begin with, while this study shows that using multiple tests may not even be enough.

What can be learned from this apparent mess? For starters, multiple assessments should be used for each hypothesized trait, and those tests should be administered multiple times. Several types of statistical tests should be applied to the data. These can help offer different perspectives for interpreting the data, and can in turn lead to further investigations. This needs to be done carefully, however, as too many statistical tests raises the possibility of finding a positive result by accident, or a false alarm.

Most importantly, scientists should examine their data free from pre-established biases about which tests are associated with what traits. For example, the researchers began with an assumption that exploration-avoidance and boldness-shyness are two dissociable personality traits. They then made assumptions about which tests served as measurements for either trait. Both assumptions proved incorrect.

"While it seems that the initial response to detecting the visual predator and the number of trials before behavioral habituation to the repeated presentation of the predator are logically connected, the initial [time to re-emerge after clamping] also loads on this factor despite the visual predator having not yet appeared." Eschewing the initial assumptions, Watanabe muses, "Perhaps the action of being placed into a restraining clamp activates the same behavior class as does detecting an approaching predator. In fact, it may even simulate the handling cues that would be present following being captured and held by a predator. Thus, we might conclude that [this set of tests] groups behaviors related to predator detection, rather than a general personality trait like boldness-shyness."

The lesson here is that extreme care is necessary when classifying, categorizing, or labeling behavioral assessments. Instead of starting with conjectures about personality traits and then deciding what tests to use to assess them, it is more reasonable to first determine which behaviors may be related, and then use those sets of related behaviors to derive hypotheses about personality. This should allow researchers to then make better inferences about how tests map onto personality in species that could be very different from our own. "Animals," the researchers wisely caution, "may perceive situations very differently from us, and thus respond to them in different ways."

Watanabe, N.M., Stahlman, W.D., Blaisdell, A.P., Garlick, D., Fast, C.D. & Blumstein, D.T. (2012). Quantifying personality in the terrestrial hermit crab: Different measures, different inferences, Behavioural Processes, DOI: 10.1016/j.beproc.2012.06.007

Image: Caribbean hermit crab via Wikimedia Commons/Zoofari.