May 12, 2014

A Look under the Hood of Online Data Visualization

Andy Kirk (of Visualising Data) recently published a clever image-driven post in which he uses automobiles to make a series of points about the practice of data visualization.

By Jen Christiansen

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

Andy Kirk (of Visualising Data) recently published a clever image-driven post in which he uses automobiles to make a series of points about the practice of data visualization. Interestingly, cars also came to my mind when reflecting upon a data visualization gathering held a few weeks ago.

OpenVis Conference is an annual event (now in its second year) focused on “best practices for data processing, storytelling, visual design, code structure and implementation using the latest and greatest technology and tools on the Open Web.” I was a speaker this year, tagged with “design” and “scientific visualization.” My wheel-house. But, although I collaborate with and edit freelance visualizers who use the latest and greatest technology and tools, I am personally rooted in a print and illustrative tradition. I don’t program. (Status update pending). I was a little nervous about how relevant my experiences and perspective would be to this particular group of conference participants.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

During the course of listening to 20 individuals present on topics ranging from coding for the news, color theory, using iPython Notebooks, visualizing nothing (as in zero), to understanding machine learning with D3, it occurred to me that this was, perhaps, the most holistic data visualization conference I had ever attended.

So where do cars fit in? Forgive me for a moment, as I drive an extended metaphor into the ground.

In my experience, many conferences focus on how cars (data visualizations) look. Generally they are presented out of context, on a pristine showroom floor. Perhaps with a nod to fuel efficiency and some discussion of the target demographic. But at OpenVis Conference, we popped the hood and discussed the inner workings, then stepped back and chatted about the aesthetics, went on a few virtual test-drives through live demos, and even imagined the places it might take us. It’s nice to be reminded that all of those things are intricately linked. The engine wouldn’t be worth much without the chassis. And even if a car looks awesome, it’s not very useful if you can’t drive it. What a treat to spend several days chatting with folks from many different areas of expertise, interested in learning more about the whole.

What follows is an abridged version of my OpenVis Conference presentation–a glimpse at the process and guiding philosophy behind data visualizations for Scientific American. For the full program (videos are forthcoming), see the conference website. For thoughts from the meeting, see this storify list of my tweets, or search #OpenVisConf on Twitter.

* * * *

Rather than focus on the tools used to create graphics for the magazine, I’ll be addressing things from an art director’s point of view. I have the distinct pleasure of hiring designers on a project-by-project basis, allowing me to rely on the talent and constantly evolving skills of freelance visualizers. My job lies in helping to match the right artist with the right data set, and then nudging things along, always with an eye towards it’s ultimate audience; our science-savvy, but non-specialist readers.

Before I talk about specific projects like this:

I’d like to step back a little, and provide some context. The collaborations that I’ll be speaking about are entirely driven by the ultimate context. And by that, I don’t just mean final medium, such as ink on paper or pixels on screens, but also the meta-context, including Scientific American’s international editions, our legacy, and the realities of our workflow.

Scientific American is the oldest continuously published magazine in the United States. It was founded in 1845 as a 4-page weekly devoted primarily to inventions.

In 1921 the scope officially broadened. It shifted from a weekly inventor’s paper to a monthly periodical of popular science.

Several redesigns occurred during the decades that followed, but the next watershed moment occurred in 1948 when the magazine expanded coverage beyond advances in industry, and embraced the broader world of applied and theoretical science. The magazine would communicate science in clear language to the interested lay person.

The English-language website was launched in 1996 (this is how it appears now).

Followed up the next major magazine redesign, in 2001.

The most recent redesign was in 2010.

And our tablet edition for the iPad was launched in 2012.

We have 14 international partners, who have access to our original content, but also develop articles specifically for their own audience, so it’s not a page for page direct translation.

Meticulous illustrations of inventions and mechanical properties were a trademark of the early issues.

And although many of those diagrams were often based on exact measurements, I think this diagram on inertia, momentum, and projection can be considered the first proper data-based chart in the magazine. Perhaps that’s a bit generous, considering it portrays an impression of the values, more than an exact representation of the values. But the fact remains that it exists specifically to communicate a concept based on measures.

This example is pulled from an issue after the 1948 redesign I mentioned earlier. This is the era that resonates most with me: classic science communication, in which a scientist, a journalist, and an artist collaborate to reach non-specialist audiences.

This style continued for several decades. As an announcement leading up to the 1948 redesign stated, “…the Scientific American will employ pictures to communicate the science precisely and plainly.” Just as this 1975 chart pair on cancer.

But in a few more decades, plain would be out of vogue. The graphics in the 2001 redesign issue mark a distinct departure. The charts are broken into smaller, more digestible nuggets. And the introduction of non-content bearing details like a third dimension and drop shadows suggest that the goal of the graphics is no longer to simply convey the information as clearly as possible. They are also being used as devices to activate the page and entice readers.

Rather than dismiss the 3D pie charts as misguided, what can we learn from this conscious choice to shift from plain and precise, to self-consciously dynamic?

After all, illustrative details at the expense of a clear read of the data is nothing new.

What do these two eras of the magazine have in common (above)? In both cases, the magazine was actively trying to engage a broader audience. An honorable and necessary goal. Which leads me to today’s philosophy towards data visualization in the magazine. My goal is to help produce graphics that honor the data, but also entice the reader. That means embracing new forms, when appropriate, and never underestimating the power of a welcoming gesture, such as these bee illustrations.

Now, let’s dive into some specific examples. I’ll start with 2 graphics from the September 2013 single topic issue on food. First, the flavor network. This was a print-first project that would accompany a somewhat playful introduction to the full issue. The graphic would be based on results published in a peer-reviewed journal.

You may recognize this figure, from the original paper in Scientific Reports, published in 2011. It connects food items that share chemical compounds. The researchers then went on to cross check these connections against recipe databases, to see if there was a relationship between often-paired ingredients, and their chemical compound overlap. As it turns out, Western cuisine does tend to pair strongly related ingredients together. But classic recipes from East Asia do not. Our goal was to create an image that would bring some of the researchers’ conclusions to a new audience. And perhaps reveal the patterns in an even more intuitive and engaging way.

The paper was open. I could download initial data set immediately to review. And it was a signal that the authors would likely be open to and excited about collaborating. Which they were. Scientists Yong-Yeol Ahn and Sebastian Ahnert were really helpful in suggesting a filtering method and reviewing the sketches. With data in-hand, I could go ahead and carve out a space for the graphic in-layout. I reached out to freelancer Jan Willem Tulp to see if he was interested in the project.

A reminder–I’ll be speaking about these projects from the art director’s point of view. Not the artist’s. When possible, I’ll also direct you to some resources that will help flesh out the story. In this case, check out Jan Willem Tulp’s presentation from the Visualized conference in NY. I don’t believe the video is live yet–but the link above should work when it does become available.

For this project, my directive was pretty open-ended. Take the data set behind the original flavor network diagram, and develop a one-page variation on it for our print magazine. If interest builds, and if an interactive version of that graphic makes sense, then we’d negotiate a companion web item.

This is the first sketch that Jan Willem sent to me. You’ll see from his Visualized presentation that it was definitely not his first attempt.

I try to get concept sketches into layout as soon as possible. It became clear that I’d need more than one page to present the data and set it up with text. Coordinating with the whole article team early on would increase my odds of being able to spill over the gutter for an introduction and annotations.

Back to the sketch. The original flavor network already did a great job of highlighting the topology. Jan Willem could improve on it by sorting or grouping so that it would be easier to read that information.

He sorted the ingredients into columns by category.

Each black dot is an ingredient. The size of the dot corresponds to how often that ingredient appeared in the recipe database.

The blue circle paired with each black dot represents the total number of links with other ingredients.

Gray lines connect ingredients with shared chemical compounds in the same category. Orange lines connect ingredients with shared chemical compounds across categories.

The vertical position of the dot is based on the ingredient’s mean shared compound value for all the links that an ingredient has. This was the sticking point. In some ways, it’s a logical metric to sort by. Ingredients with lots of strong relationships to other ingredients rise to the top. But it’s kind of a complicated measure. It takes a few mental leaps.

The overall structure and plan felt right. But the metric behind the vertical position of each dot wasn’t very intuitive. So I asked for variations…and Jan Willem sent lots of options. I notated his screenshots with a description on the right to help solidify the logic in my own mind, and to more efficiently communicate with the article editor. I won’t get into all the details about each sketch, but thought you might like seeing some of the options that followed.

This option was our favorite. Vertical position is based upon the total number of connections to other ingredients. (Not the strength of those links). The blue outlined circles that originally held that information were no longer needed.

Once the structure settled in, we turned to fine tuning the aesthetics, informed by the color palette and style that was emerging for the full special issue. Here’s the final article.

You’ll note that we added annotations to bring attention to a few points we found interesting, but perhaps most critically, we included a large “How to Read This Graphic” panel, in which we walk the reader through the symbols used in the graphic in a very conversational way. This approach emerged pretty organically from the verbal explanation I had to provide whenever I showed a colleague the graphic for the first time. And it was more inviting than a small key or legend in the corner.

While I was fine-tuning labels, working through fact-check questions, and preparing the print version for press, Jan Willem turned attention to developing an interactive version for the website. The best option based simply on our budget and deadline schedule, would be start with the print version, and allow readers to select and view by ingredient. That way they could clearly see all the connections, by eliminating overlapping lines.

Selecting by ingredient seemed logical, but a drop down list for nearly 400 ingredients was unwieldy, and a complete list on screen was not legible. Fish-eye distortion of the list was still too dense to efficiently navigate. So what about simply clicking on the blue ingredient dots? It turned out that many selections only show a very sparse network. Of all 381 ingredients, 148 have no links outside of their category, and 150 have just 1 link outside of their category. So only 83 ingredients would have two or more pink connector lines. Randomly clicking on ingredients might not yield a very satisfying experience.

So after exploring a few other ideas as well, Jan Willem suggested that rather than selecting by individual nodes, why not select by row? The reader would be less likely to stumble across an single boring ingredient. But this approach would still filter out many lines, so that individual connections would be easier to follow.

Click here to see the final interactive graphic.

Circling back to the other food-related item…
In this case, the assignment was much more open-ended and a bit out of our comfort zone. A digital-first, reader-directed exploration tool, drawing from a huge data set, but without pre-analyzed conclusions. I asked the Office for Creative Research for food-related suggestions.

Jer Thorpe and Ben Rubin pitched several ideas. The magazine team was most interested in data about produce shipments into and around the U.S., based on daily reports from the U.S. Department of Agriculture.

Here are some preliminary explorations of the data over time and space.

And this is how the final data ended up being displayed. Readers could choose the produce item by using a dropdown menu, and then customize the time period by sliding date tabs at the bottom.

Three produce items could also be compared against one another, on the same screen….

And total import volume over time could be explored on this interactive map.

A few details still needed to be sorted for the interactive item, but the reality of the press schedule meant that we needed to turn attention to the print item. An early concept sketch contained some really neat ideas (left), but was overly-ambitious for a single static page. So we tried deleting some side stories, and clicked the maps into a small multiples grid (right).

Here’s the final page, as it appeared in the magazine. For a closer look, see the static web version and final interactive. For thoughts from one of the artists, see Sandra Rendgen’s interview with Jer Thorp.

So far I’ve talked about a print first project, and a digital first project. But sometimes, the needs of print and web are so different that it makes sense to approach the same data set differently for each medium. For our 2012 “State of the World’s Science” special report, I hired artist Arno Ghelfi for the print spread, specifically for his dynamic and bold style. For the web, however, I could embrace a quieter aesthetic, as evidenced by the intreractive by Jan Willem Tulp.

Here’s a closer look at the print spread. The center spiral of bubbles represent the “top 25 countries in science” as measured by an index based on research paper output. The bottom left lists show rankings for a few other measures, such as number of patents issued. Yes, we could have made this spread work harder in terms of density of information. But, in this case, it all came down to context.

The full section was very text heavy. The central graphic spread not only needed to show the data, it also needed to act as a welcoming entry point into the section. Yes, data visualizations should honor the data. But they also need to be mindful of their context.

On the web, the graphic would be presented as its own module. The context was very different. Here, we could focus on presenting more information. We’d start with country rankings, but the reader could also see absolute numbers.

Sometimes, no matter how much it pains me, an interactive web version of a static graphic might not be the best use of limited resources. Although Moritz Stefaner would have been supremely capable of creating an interactive companion to this print graphic, I had to be a bit realistic about things. The page is based on an neat nugget of a project, in which scientist Laura Burkle and her team compared their bee and flower interaction data with a similar study from the same site conducted over a hundred years ago. It was a perfect candidate for a static one-page stand-alone graphic in the magazine. But I think the scope was a little too narrow to warrant a full-blown interactive.

The text editor and I were captivated by the study when doing research for a different, broader article on native bees. We set aside this figure from that original paper, to revisit later. I couldn’t help but wonder how someone like Moritz might re-imagine the same data set.

Moritz wrote about this graphic on his blog, so I won’t attempt to rehash that now. Instead, I’ll pick things up after he reached this point (below). After many concept sketches, he settled into basing the structure on the bees. Here, each pale yellow circle represents a bee species active in Carlinville, Illinois in both the late 1800s and 2010. Pink circles represent bee species that were active in the 1800s, but are now locally extinct. Each tab position within those circles represents a one of 26 plant species pollinated by this group of bees. The color and size of that tab holds information about the changing nature of the bee and plant interaction over time.

I was sold on the basic concept, but there was still work to be done. Moritz tried grouping species by genus, but this preliminary sketch did not include all of the species. And although it was intriguing from a visual standpoint, there wasn’t a strong editorial reason to arrange the bees in smaller groups.

So I took the original pattern, and built out a version that included the correct number of bee circles. Once we sorted out that the graphic would actually fit on the page, Moritz starting refining things. Rather than have the extinct bees pop out with a strong red and pink color, he let those circles fade back–helping to reinforce the idea of disappearing species. And to help reinforce the honeycomb pattern, Jillian Walters illustrated 3 bees, in a vintage style, as a subtle nod to the project start year.

Since this is an unusual graphic form, we needed to really craft the annotations and labels. Once again, we led in with a somewhat conversational approach, rather than just featuring a stripped-down key or legend.

For a closer look at the full graphic, see the web-friendly version here.

Let’s revisit the legacy of Scientific American.

Most of my decisions are rooted in a tradition of precise and plain visual explanations of scientific discoveries.

But I also need to be cognizant of the need to be accessible and inviting to broader audiences.

In my opinion, those goals are not mutually exclusive. But in certain contexts, even within the same magazine, it may make sense to shift a bit in one direction or the other.

I may skew towards more playful and lighter solutions, when the graphic itself also needs to function as an invitation to dive into a larger, text-driven package.

I try not to underestimate the value of a welcoming gesture, such as illustrative details.

And maybe a clearly stated and upfront invitation to read the graphic provides a little leeway to feature a more complicated or unfamiliar visualization form.

Footnote: For more of my thoughts on welcoming gestures as they relate to illustrated information graphics (not just data visualizations) see the October 2012 blog post “A Defense of Artistic License in Illustrations of Scientific Concepts.” For a pdf download of my more formal paper on the same topic, “A Defense of Artistic License in Illustrating Scientific Concepts for a Non-Specialist Audience,” (from 2CO Communicating Complexity: 2013 Conference Proceedings, edited by Nicol? Ceccarelli), click here.