A couple of months ago, the sun sported the largest sunspot we’ve seen in the last 24 years. This monstrous spot, visible to the naked eye (that is, without magnification, but with protective eyewear of course), launched more than 100 flares. The number of the spots on the sun ebbs and flows cyclically, every 11 years. Right now, the sun is in the most active part of this cycle: we’re expecting lots of spots and lots of flares in the coming months.
Usually, the media focuses on the destructive power of solar flares — the chance that, one day, a huge explosion on the sun will fling a ton of energetic particles our way and fry our communication satellites. But there’s less coverage on how we forecast these things, like the weather, so that we can prevent any potential damage. How do you forecast a solar flare, anyway?
One way is to use machine learning programs, which are a type of artificial intelligence that learns automatically from experience. These algorithms gradually improve their mathematical models every time new data come in. In order to learn properly, however, the algorithms require large sums of data. Scientists lacked any solar data on this scale before the 2010 launch of the Solar Dynamics Observatory (SDO), a sun-watching satellite that downlinks about a terabyte and a half of data every day—more than the most data of any other satellite in NASA history. Explore an interactive graphic showing where on the sun flares of different classes have been sighted over the years: Click image below.
Solar flares are notoriously complex. They occur in the solar atmosphere, above surface-dwelling sunspots. Sunspots, which generally come in pairs, act like bar magnets — that is, one spot acts like a north pole and the other like a south. Given that there are lots of sunspots, that various layers on the sun are rotating at different speeds, and that the sun itself has a north and south pole, the magnetic field in the solar atmosphere gets pretty messy. Like a rubber band, a really twisted magnetic field will eventually snap—and release a lot of energy in the process. That’s a solar flare. But sometimes twisted fields don’t flare, sometimes flares come from fairly innocuous-looking sunspots, and sometimes huge sunspots never do a thing.
We don't understand the physics of how solar flares occur. We have ideas — we know flares are certainly magnetic in nature—but we don't really know how they release so much energy so fast. In the absence of a definitive physical theory, the best hope for forecasting solar flares lies in scrutinizing our vast data set for observational clues.
There are two general ways to forecast solar flares: numerical models and statistical models. In the first case, we take the physics that we do know, code up the equations, run them over time, and get a forecast. In the second, we use statistics. We answer questions like: What’s the probability that an active region that’s associated with a huge sunspot will flare compared with one that’s associated with a small sunspot? As such, we build large data sets, full of features—such as the size of a sunspot, or the strength of its magnetic field—and look for relationships between these features and solar flares.
Machine learning algorithms can help to this end. We use machine learning algorithms everywhere. Biometric watches run them to predict when we should wake up. They’re better than doctors at predicting rare genetic disorders. They’ve identified paintings that have influenced artists throughout history. Scientists find machine learning algorithms so universally useful because they can identify non-linear patterns—basically every pattern that can’t be represented by straight lines—which is tough to do. But it’s important, because lots of patterns are non-linear.
We’ve used machine learning algorithms to forecast solar flares using SDO’s vast data set. To do this, we first built a database of all the active regions SDO has ever observed. Since it’s historical data, we already know if these active regions flared or not. The learning algorithm then analyzes active region features—such as the size of a sunspot, the strength of its associated magnetic field and the twistedness of these field lines—to identify general characteristics of flaring active regions.
To do this, the algorithm starts by making a guess. Let’s say its first guess is that a tiny sunspot with a weak magnetic field will produce a huge flare. Then it checks the answer. Whoops, nope. The algorithm then tweaks the way that it guesses. The next time around, it’ll make a different guess. Through trial and error—in the form of hundreds of thousands of guesses and checks—the algorithm figures out which features correspond to flaring active regions. Now, we have a self-taught algorithm that we can apply to real-time data.
Expanding such efforts could help us provide better notice of impending solar flares. So far, studies have found that machine-learning algorithms forecast flares better than or, at the worst, just as well as the numerical or statistical methods. This is kind of a phenomenal result in and of itself. These algorithms, which run without any human input whatsoever by simply looking for patterns in the data, and which are so general that you can use the same algorithm (on a different data set) to identify genetic disorders, can perform just as well as any other method used thus far to forecast solar flares.
And if we have more data? Who knows. Although we already have tons of data—SDO has been running for four and a half years—there haven’t been a ton of flares during that time. That’s because we’re in the quietest solar cycle of the century. That’s more reason to continue collecting data and keep the algorithms busy.