August 8, 2011

"Anything But Country": What Factor Analysis Reveals About Our Tastes for Tunes

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

What’s your favorite type of music?

Lil’ Wayne? AC/DC? Anything but country?

We like to think that our musical preferences are somehow deeply unique and meaningfully representative of who we are as individuals. But what if I told you that when it boils down to it, we’re not all that different from each other? In fact, most of those seemingly "nuanced" differences in musical taste can be summed up by a mere five factors.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

What Exactly Is A Factor – And What Is A Factor Analysis?

A factor can be thought of as an underlying concept that explains the variability in a given dataset.

To understand the basic theoretical idea behind what a factor analysis involves, take this picture of the Muppets from Sesame Street. Conceptually a factor analysis is like asking, "How can we explain the most about how these Muppets are different from each other while using the fewest adjectives possible?" Ideally, you’d want to use enough descriptors to be thorough without going overboard, perhaps focusing on 2-5 of the most crucial, defining differences. Individually describing every single Muppet wouldn’t be very helpful, nor would it be helpful to say, "Well, some of them are furry." However, if you posit that the most important features to focus on are furriness, color, and clothing, you’ve done a pretty good job of briefly (yet thoroughly) summarizing the main ways in which the Muppets differ from each other – and those features could also be thought of as factors.

Behind all of the numbers, figures, and statistics, this is the conceptual basis of a factor analysis.

Moving From Muppets To Music

In an effort to understand the science of musical taste, Peter Rentfrow, Lewis Goldberg, and Daniel Levitin conducted a factor analysis of musical preferences. Much like in the Sesame Street example, the researchers were looking for factors to explain differences in the data – only instead of Muppets, we are now examining people’s ratings of 15-second music samples. The final set of "factors" should do a thorough (yet succinct) job of explaining overarching "like" and "dislike" patterns in people's ratings (for example, a pattern showing that rock & roll fans also tend to like punk and heavy metal).

To conduct this analysis, the researchers first picked 52 songs that sound similar to well-known songs from several major genres, but never became popular (to avoid any bias from sheer overexposure).

A sampling of some songs that were chosen by the research team

They then asked all of the participants to rate how much they liked a 15-second clip of each song on a scale from 1 to 9.

Printed below is the full list of songs, and the numbers on the right are factor loadings. The Roman numerals at the very top (I, II, III, IV, and V) indicate the five factors that the researchers planned to find – the musical equivalent of five features summarizing the Muppets’ differences.

It’s important to note that there’s a reason why they ended up settling on five factors – they went through a bunch of different possibilities first, and five was the number that did the best job of balancing thoroughness with brevity. I'll discuss this more later.

In the meantime, you need to know two things to understand what these numbers mean: They range from -1.0 to 1.0, and a high number (preferably over 0.40) means that the song "loads onto" (or belongs) to that factor. Factor analysis is a bit like a puzzle; it’s not as if the statistical program spits out the adjectives for you. You have to look at what loads onto each "factor" and then figure out what the descriptors should be for yourself; there’s no "right answer" lying around simply waiting to be discovered.

For example, in the first column, the samples displaying high numbers mostly consist of classical, jazz, and instrumental music; now you have to find a word that best represents those musical pieces. The researchers called it "Sophisticated," but you likely could have come up with other ideas. How about the second column? Country rock, New country, Mainstream country, Bluegrass, Rock ‘n’ Roll…in this case, the researchers went with "Unpretentious," but you probably could have found an equally plausible alternative (perhaps "country" or "rockabilly"). The third, fourth, and fifth columns were defined by the researchers as "Intense," "Mellow," and "Contemporary," respectively.

Now that we’ve seen the statistics, what does this look like graphically?

The researchers ran the analysis four times, each time planning for a different number of factors (2, 3, 4, or 5). This is a graphic representation of all four solutions, with the five-factor solution at the very bottom, and the two-factor solution on the second row. We actually get some good information from looking at each possible solution, which is why it’s really interesting to go back and look at all of the options rather than simply focusing on the researchers’ final five-factor solution. For example, the two-factor solution is highlighted in red below:

When the researchers looked for 2 factors, they ended up with Sophisticated Music vs. Everything Else. This actually tells us something fairly important: When you specifically try to find the single most important "this or that" distinction in musical preferences, you end up with sophisticated on one side, consisting mostly of classical, jazz, and instrumental music, and everything else on the other, including tunes ranging from country to heavy metal.

Let’s revisit the title of this post. Doesn’t it sometimes seem like everyone’s default response to the "favorite music" question is anything but country? That may be one of the more common responses, but it’s probably not true. What this answer likely indicates is a preference for some combination of rap, pop, and/or rock music, but it’s incredibly unlikely that these respondents have a particular affinity for Celtic music, Swedish death metal, or polka. In fact, here’s what the two-factor solution really tells us: When it comes to a "this" vs. "that" distinction in musical taste, it doesn’t come down to "country" vs. "everything else" – it comes down to "sophisticated" vs. "everything else." If someone really wants to give an all-or-nothing response to the favorite music question, it would probably be more accurate for him to say he likes anything but the high-brow stuff. Or, conversely, anything but the low-brow stuff.

Additionally, notice that the Sophisticated box remains consistent in every single row; what this tells us is that the songs from that original category don’t tend to move over and become a part of different categories, even as more potential options appear. This gives us even more insight into that musical category – not only is it conceptually distinct from everything else in the two-factor solution, but even when more categories open up and there's a chance for some of the songs in that "box" to be re-grouped with other samples, they don't budge. It really seems like the "classical, jazz, and instrumental" classification is its own beast, and it has very little in common with other types of music.

Once you allow for a third factor, some of the songs from that "everything else" category split off into another box called "Intense." What this means is that the rock, heavy metal, and punk songs are separating out from the rest of the songs, leaving the rest behind in that ambiguous catch-all category (now termed by the researchers as "Unpretentious.") Once a fourth factor is added in, that catchall box breaks down a bit more and makes room for a "Mellow" category.

(As a tip, you can actually tell which categories are "breaking down" and losing some of their songs to the new factors based on the numbers printed on the arrows. The closer the number is to 1.00, the more that box remains the same from row to row. The lower the number is, the more that box has changed.)

The final grouping of factors that the researchers settled on was a set that they cleverly termed MUSIC: Mellow, Unpretentious, Sophisticated, Intense, and Contemporary. What does this mean? Simple: These are the five most important descriptors involved in categorizing music and explaining differences in our musical tastes. If we like a piece of music from the "Mellow" box, we probably also like other music from the "Mellow" box.

Are Genres So Important After All?

So what does this mean for how we understand music?

Interestingly, this is one of the only studies on musical preferences that does not break the results down by genre. Classifying music based on this MUSIC model is not the same thing as saying "If you like rock music, you will like other rock music." It may seem like the MUSIC adjectives look like genres, but in reality, dozens of genres can fit into just one of the MUSIC categories. This study really suggests that the specific "genre" of a song may not be so important after all; really, our tastes may be more guided by these underlying musical characteristics that span a wide variety of industry-imposed labels. If you like Barry White and Jack Johnson, categorizing those songs based on genre makes you seem idiosyncratic, but if you realize that they’re both Mellow, it doesn’t seem quite so strange.

Pandora’s Overcomplicated Box

Most likely, you are familiar with the website Pandora, part of the Music Genome Project. If not, it is a well-known project designed to "sequence" various songs; if you put in a beloved song or artist, it plugs this preference into its algorithm to find other songs that you should like.

Essentially, the Pandora algorithm categorizes every song based on 400 musical characteristics, ranging from lead vocalist gender to the level of electric guitar distortion. Once you "thumbs up" a given song, the software automatically looks at that song’s scores on all 400 characteristics and then compares it to every single other song in the database; whatever songs have the "closest" scores on the highest number of characteristics get played next.

It’s very impressive, and plenty of people use (and love) Pandora, but there’s one thing this research can tell us: That might not all be so necessary. How much do we need 400 characteristics when our musical taste patterns can be summarized so well by a mere five?

There’s even some anecdotal evidence to support the idea that despite all of the effort behind its creation, Pandora may be deferring to this five-factor model more than the sequencers even realize. I’ve heard countless stories of Pandora choosing seemingly random songs to play on a given station; one of my favorite personal anecdotes is the time I had the Lil’ Wayne Pandora station on in the background while I worked, and eventually realized that the last three songs it had played were Mandy Moore, Jessica Simpson, and Britney Spears. At the time this made no sense, but here’s the funny thing: According to the MUSIC model, this may have made perfect sense. After all, all four of those artists fall into that last "Contemporary" box. Lil’ Wayne and Britney may seem to have little in common on the surface, but conceptually, they’re more similar than artists like Rihanna (Contemporary) and AC/DC (Intense).

And to be fair, I can’t really complain. I do love both Weezy and Brit.

References: Rentfrow PJ, Goldberg LR, & Levitin DJ (2011). The structure of musical preferences: a five-factor model. Journal of personality and social psychology, 100 (6), 1139-57 PMID: 21299309