July 17, 2012

5 Sigma What's That?

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

Chances are, you heard this month about the discovery of a tiny fundamental physics particle that may be the long-sought Higgs boson. The phrase five-sigma was tossed about by scientists to describe the strength of the discovery. So, what does five-sigma mean?

In short, five-sigma corresponds to a p-value, or probability, of 3x10^-7, or about 1 in 3.5 million. This is not the probability that the Higgs boson does or doesn't exist; rather, it is the probability that if the particle does not exist, the data that CERN scientists collected in Geneva, Switzerland, would be at least as extreme as what they observed. "The reason that it's so annoying is that people want to hear declarative statements, like 'The probability that there's a Higgs is 99.9 percent,' but the real statement has an 'if' in there. There's a conditional. There's no way to remove the conditional," says Kyle Cranmer, a physicist at New York University and member of the ATLAS team, one of the two groups that announced the new particle results in Geneva on July 4.

Scientists use p-values to test the likelihood of hypotheses. In an experiment comparing some phenomenon A to phenomenon B, researchers construct two hypotheses: that "A and B are not correlated," which is known as the null hypothesis, and that “A and B are correlated,” which is known as the research hypothesis.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The researchers then assume the null hypothesis (because it's the most conservative supposition, intellectually) and calculate the probability of obtaining data as extreme or more extreme than what they observed, given that there is no relationship between A and B. This calculation, which yields the p-value, can be based on any of several different statistical tests. If the p-value is low, for example 0.01, this means that there is only a small chance (one percent for p=0.01) that the data would have been observed by chance without the correlation. Usually there is a pre-established threshold in a field of study for rejecting the null hypothesis and claiming that A and B are correlated. Values of p=0.05 and p=0.01 are very common in many scientific disciplines.

High-energy physics requires even lower p-values to announce evidence or discoveries. The threshold for "evidence of a particle," corresponds to p=0.003, and the standard for "discovery" is p=0.0000003.

The reason for such stringent standards is that several three-sigma events have later turned out to be statistical anomalies, and physicists are loath to declare discovery and later find out that the result was just a blip. One factor is the "look elsewhere effect:" when analyzing very wide energy intervals, it is likely that you will see a statistically improbable event at some particular energy level. As a concrete example, there is just under a one percent chance of flipping an ordinary coin 100 times and getting at least 66 heads. But if a thousand people flip identical coins 100 times each, it becomes likely that a few people will get at least 66 heads each; one of those events on its own should not be interpreted as evidence that the coins were somehow rigged.

So where do the sigmas come in? The Greek letter sigma is used to represent standard deviation. Standard deviation measures the distribution of data points around a mean, or average, and can be thought of as how "wide" the distribution of points or values is. A sample with a high standard deviation is more spread out—it has more variability, and a sample with a low standard deviation clusters more tightly around the mean. For example, a plot of dogs' heights would probably have a larger standard deviation than a plot of heights of dogs from a particular breed, even if that breed had the same average height as dogs in general.

For particle physics, the sigma used is the standard deviation arising from a normal distribution of data, familiar to us as a bell curve. In a perfect bell curve, 68% of the data is within one standard deviation of the mean, 95% is within two, and so on.

In the case of the results announced last week, the process was more complicated than simply taking the results from one experiment and measuring the deviation of the data from the expected background levels; data came from many different channels, and each one had a different expected background signal. In addition, there were uncertainties about the measurements from the detectors that had to be taken into account. Researchers used a complex formula to combine all of these variables and calculate a p-value. This value was then translated into a number of sigmas above the mean, because the number of collisions observed at the energy of the newly discovered particle was higher than the expected background.

This final point led to some confusion in the media about the p-value associated with five-sigma. In a normal distribution, data is symmetrically distributed on both sides of the mean. It is twice as likely for data to be in either the high or low tail than just the high tail, so some outlets reported that five-sigma corresponded to a p-value of 0.0000006, or 1 in 1.7 million, rather than the correct value of 0.0000003, or 1 in 3.5 million. For further discussion of this subtlety, see this Understanding Uncertainty blog post.

The excitement about the Higgs discovery led the two teams to announce their results before all the data had been analyzed. Going forward, after both teams' analyses are complete, the groups will combine their observations. Although the two experiments are based on similar physical principles, it is not trivial to combine their data in a meaningful way. If your wallet were filled with both U.S. dollars and Euros (or Swiss Francs if you were visiting CERN), you couldn't simply add the numbers on the bills to find out how much money you had; you would have to perform some conversions first. The groups will use what Cranmer calls "collaborative statistical modeling" to combine the results of the two experiments (ATLAS and CMS). This approach has already been used to perform "conversions" on data sets within each team's experiment. When complete, these analyses will convey a more accurate sense of the strength of the new evidence and determine whether the observed data is consistent with the Higgs boson physicists seek.