Editor’s Note: The following is a guest post from Jake VanderPlas, a data scientist who worked on the Graphic Science illustration in the October issue of Scientific American magazine.

One of the largest treasure troves of astronomical data comes from the Sloan Digital Sky Survey (SDSS), an ongoing scan of the firmament that began 15 years ago. Its catalogue covers 35 percent of the sky and contains multicolor observations of hundreds of millions of distinct galaxies, stars and quasars. If a person were to attempt to individually inspect each of these objects at a rate of one per second through the workday, it would be a full-time job lasting over 60 years!

Fortunately, such individual inspection is not how astronomers work. Instead, we use various specialized algorithms to automatically sift through and categorize this vast data set and dream up novel visualization schemes to make clear in a glance the relationships between thousands or millions of individual objects.

One of my favorite examples of this type of visualization involves a subset of the above data, the SDSS Moving Object Catalog, which my colleague Zeljko Ivezic has been instrumental in producing and collating. This catalogue contains detailed data on nearly half a million small asteroids orbiting our sun, allowing us not only to track the orbital path of individual asteroids but also to gain insight into the chemical composition and formation history of individual objects and the solar system as a whole. The following video simulation from Alex Parker gives a glimpse into the orbital characteristics of this data:

The SDSS data gives us much more than just the orbital dynamics. Multiband imaging gives us detailed measurements of the color of reflected sunlight off each of these asteroids. Just as on Earth our eyes can distinguish white quartz from dark basalt based on how they each reflect sunlight, the SDSS telescope can distinguish among different chemical compositions of asteroids based on how their surfaces reflect sunlight.

We can summarize this chemical information with two “color” measurements: color in the optical range and color in the near-infrared range. Combining this with the semimajor axis (a measure of the size of the orbit around the sun) and the inclination (a measure of how “tilted” the orbit is compared with Earth’s orbital plane) gives us a four-dimensional data set: four properties of each asteroid that contain information about its orbit and chemical composition.

Visualizing the fourth dimension
With this four-dimensional data set decided, we can now think about how to best visualize it. One-dimensional data fits on a number-line; two-dimensional data can be plotted on a flat page or screen; three-dimensional data can be conceived as a 3-D plot, perhaps rotating on a computer screen; but how do you effectively plot four-dimensional data?

We can start simple, by splitting the data into chemical indicators on one hand and orbital indicators on the other:

This visualization is full of information. The left panel is known as a color-color diagram and distinguishes between broad classes of asteroid chemistries. The left-most clump in this panel is primarily carbonaceous (C-type) asteroids whereas the right-most clump is primarily silicaceous (S-type) asteroids. The faint downward extension of the right-most clump is V-type asteroids, known to be associated with the asteroid Vesta.

The right panel showing the orbital characteristics offers even more insight. Immediately we see that there is some intriguing structure to the data: clumps and clusters as well as specific regions that are devoid of any asteroids at all. These clumps are known as asteroid families, and the vertical voids near 2.5 astronomical units (AU) and 2.8 AU are areas of orbital resonance with Jupiter: In this particular region of the solar system, these resonance effects quickly kick any asteroids out of their orbits. (An astronomical unit is the mean Earth-sun distance, or about 149.6 million kilometers.)

These are all interesting insights, but what we’d really like is to see intuitively how the chemistry reflected in the left panel is related to the orbital dynamics reflected in the right panel. Such a four-dimensional relationship is very difficult to capture.

Plotting all pairs of dimensions?
One common way to visualize high-dimensional data is to use a grid of multiple two-dimensional plots. In this way we can plot each pair of features against one another and look at the correlations. Of course, the two panels from above are just a subset of the six distinct plots (each with a mirror-image) created by this method:

This plot conveys a lot of information, and there are some intriguing pieces. For example, in the panel comparing near-infrared color to orbital inclination (top row, second from the left) we see a distinct clump of data: These are points that are clustered both in color and inclination. Further investigation shows this clump reflects the Vesta family, a chemically similar group of asteroids that also shares the same orbital inclination. We’ll return to these below.

Colors as added dimensions
Another common high-dimensional visualization technique is to treat color as an added dimension. This way a standard two-dimensional plot can reflect three-dimensions of information. Let’s try visualizing the four dimensions in this way: We’ll do two versions of the orbital inclination plot, using a different color scale in each plot:

In the left plot the colors of the points correspond to the optical color whereas in the right plot the colors correspond to the near-infrared color. With this enhancement, we’re really getting somewhere: The left panel makes clear that the clumps of asteroids in orbital space are generally grouped according to their optical colors–that is, their position on the carbonaceous-silicaceous spectrum. The right plot shows the Vesta group that we pointed out above–the group of asteroids near 2.4 AU with a blue-infrared color.

Multicolor plot
Let’s put these all together. Rather than using two separate color scales to identify these asteroid groups, we can define a single two-dimensional color scale reflecting the asteroid chemistry and use these colors when plotting the same points in orbital space. The result is a plot very similar to the one that appeared in Parker et al, 2008, where this work was first reported:

This final plot offers a full, intuitive view of the relationships between the four measured asteroid characteristics. From this visualization, it becomes clear that asteroid chemistry (reflected in the color of the individual points defined by the left panel) is strongly related to the orbital distribution of asteroids (reflected in the clumping of asteroid families in the right panel). What this shows is that families of asteroids not only orbit near one another in space, but also have largely the same chemical composition!

This observation lends evidence to the theory that asteroid families are formed by collisions of larger bodies. At some time in the past, two larger asteroids likely collided, shattering into hundreds or thousands of smaller bodies. Because these smaller bodies each came from the same source, they will be chemically similar and continue to orbit in the same region for several hundred million years.

As the volume and complexity of data grow, novel visualization techniques like this are an important part of mining large data sets in search of such insights.

Editor’s Note: This post was created using the IPython notebook, an interactive computing environment that aids in producing fully reproducible scientific analysis. The original notebook is available for download on GitHub, and contains all the code and data necessary to create the figures in this post. To see a version of the final graphic as it appears in the October, 2014 issue of Scientific American, click here.