Last year a retail industry client asked my research team why a competitor brand had a higher Net Promoter Score, or NPS. It is a widely used aggregated metric that measures how likely people are to recommend a brand or service to a friend. Although our client had a lower NPS, relative to a competitor, customers still had good things to say about the brand. In fact, they liked it a lot. So there was something else going on, something about the shopping experience, rather than the brand itself, that we believed explained the difference.

The research we conducted for the project revealed an interesting fact about measuring people. Brands obviously rely on data to understand their customers better. But they also seek something that may not leave a clear digital footprint. Big packaged-goods companies such as Procter & Gamble invest in ethnographers, focus-group moderators and other researchers who rely on their subjectivity to make insights. When it comes to understanding people as consumers, brands keep returning to a question that data answer only in part: What do my customers think?

I thought about this question as I read a new book by Christopher J. Phillips entitled Scouting and Scoring: How We Know What We Know about Baseball (Princeton University Press, 2019). Phillips, an assistant professor of history at Carnegie Mellon University, distinguishes between scorers, who calculate things such as WAR (wins above replacement), and scouts, who observe attributes such as athleticism and determine if a player “has what it takes.”

Fans of popular nonfiction will recognize scorers as the heroes of Michael Lewis’s 2003 book Moneyball. But Phillips says that the story is more complicated than the version Lewis presented, which pits cold, hard data against error-prone judgment. For example, if an outfielder drops a ball on a windy day in April and then does the same thing on a calm day in July, should each error be counted the same? Here, scorers have to rely on their judgment to make a call.

Yet scouts transform their subjective impressions into objective measurements all the time. After Major League Baseball introduced the draft in 1965, scouts began to rely on a 1-9 scale to systematically evaluate players. “The history of scoring and scouting certainly undermines any clear distinction between the claimed objectivity of scorers and the subjectivity of scouts,” Phillips writes.

Consider Henry Chadwick, a 19th-century sportswriter, statistician and early advocate for standardizing the scoring of baseball. Chadwick did not weigh hits based on how many bases a batter gained because he thought that it was too difficult to determine how many bases the batter really “earned.” Later scorers—sabermetricians, who get their name from the combination of the “Society for American Baseball Research,” or “SABR,” and “measurement”—did count how many bases a player earned per hit. They used the data to calculate slugging percentage, which weighs singles, doubles, triples and home runs.

Today there’s a statistic called exit velocity that controls for a flaw in slugging percentage—the fact that a slow-rolling infield single and a line-drive single that slams off the wall are counted the same—by tracking the speed of the ball the moment it comes off the bat. Exit velocity has become one of the most popular new metrics in baseball.

Why didn’t Chadwick think to account for slugging percentage and exit velocity? Part of the answer is technology—radar guns were not available until the 1970s, so it was impossible to measure exit velocity—but there’s a broader answer here, and it’s where the statistical history of baseball and the study of consumers converge. Data itself will only get you so far, and it may even steer you in the wrong direction. Like many baseball teams, brands are trying to figure out what they ought to measure in the first place.

When I finished Phillips’s book, I noticed one crucial difference between baseball players and consumers. In baseball, sabermetricians and official scorers don’t have to worry about the individual collecting the data influencing a batter’s slugging percentage. In market research, sometimes you need to consider how the data collection process may help to create the thing being measured. For example, if you ask someone about a neighbor, he or she might complain about a messy yard and loud dogs or rave about how nice it is to have a good friend next door. But regardless of what that person says, the response could be a new discrete attitude, pieced together on the fly.

Scorers have a different relationship with data. They collect it passively, much like your browser or cell-phone provider. Research shows that a brand’s NPS correlates with growth. But asking people if they would recommend a brand or service is like asking them to think about their neighbor. Their response may be a small act of creation, manufactured on the fly. Capturing how consumers really feel is so tricky because the act of observation has its own effects. How do you know you’re extracting a real attitude and not spurring one into existence?

Our project involving NPS was interesting because we benefited more from our experience as shoppers than as researchers. When we saw ourselves as shoppers, we imagined being in different retailers, which helped us see stores on a continuum. You go to some stores when you need to do an errand; you tend to shop faster and get annoyed quicker in these establishments (convenience stores and supermarkets may fall into this category). You go to another set of stores when you don’t necessarily plan on buying anything. I’ve spent hours browsing in a nearby bookstore and left happier than when I entered despite not buying anything. Some people like just being in an Apple Store.

What I realized about NPS is that I would recommend my favorite bookstore and my local convenience store—which is squarely in the “errand category” for me—at the same rate. They both get the job done. And yet the shopping experiences are radically different. We eventually wrote a question that controlled for this distinction and fielded it to hundreds of people in an online survey: “How much does shopping at the retailer feel like an errand to you?” We found that despite the brand’s popularity among its customers, shopping at its stores felt more like an errand as compared with those of its competitor. You could say that the errand question added a dimension to NPS like slugging percentage and exit velocity added a dimension to batting average: it accounted for retailers that might be recommended equally but experienced differently.

When you think back to 2003, when Lewis published Moneyball, you can see how the revolution in data was framed as a clash between judgment and data. In reality, it’s a tension that cuts straight through the entire history of baseball, and it currently shapes how brands think about us as consumers. Brands want to know how to measure people. But, really, they want to know what they should measure in the first place. Phillips is right: “The data sciences are inescapably human sciences. Data are not only often about people, but also produced and interpreted by people.”