Twitter  Contently is launching a nonprofit arm to do investigative reporting  
Nieman Journalism Lab
Pushing to the future of journalism — A project of the Nieman Foundation at Harvard

Dataviz, democratized: Google opens Public Data Explorer

Two years ago, Google acquired Gapminder, the Swedish graphics-display company whose Trendalyzer software specializes in representing data over time. (You may recall the company from this awesome and much-circulated TED talk from 2006.) Since the acquisition, Google has built out the Trendalyzer software to create its Public Data Explorer, a tool that makes large datasets easy to visualize — and, for consumers, to play with. The Explorer has created interactive and dynamic data visualizations of information about traditionally hard-to-grasp concepts like unemployment figures, income statistics, world development indicators, and more. It’s a future-of-context dream.

“It’s about not just looking at data, but really understanding and exploring it visually,” Benjamin Yolken, Google Public Data’s product manager, told me. The project’s overall mission, it’s worth noting, is a kind of macro-meets-meta version of journalism’s: “to make the world’s public data sets accessible and useful.”

The big catch, though, as far as journalism goes, has been that users haven’t been able to do much with the tool besides look at it. If you’ve gathered public data sets that would lend themselves to visualization on the Explorer, you’ve had to contact Google and ask them to visualize it for you. (“While we won’t be able to individually reply to everyone who fills out this form,” a contact form noted, “we may be in touch to learn more about your data.”)

Today, though, that’s changing: Google is opening up its Explorer tool. Yolken and Omar Benjelloun, Google Public Data’s tech lead, have written a new data format, the Dataset Publishing Language (DSPL), designed particularly to support dynamic dataviz. “DSPL is an XML-based format designed from the ground up to support rich, interactive visualizations like those in the Public Data Explorer,” Benjelloun notes in a blog post announcing the opening. (It’s the same language that the Public Data team had been using internally to produce its datasets and visualizations.) Today, that language — and an interface facilitating data upload — are available for anyone to use, putting the “public” in “public data.”

It’s an experimental feature that, like the Public Data Explorer itself — not to mention some of Google’s most fun features (Google Scribe, Google Body, Google Books’ Ngrams viewer, etc.) — lives under the Google Labs umbrella. And, importantly, it’s a feature, Yolken notes, that “allows users who may or may not have technical expertise to explore, visually, a number of public data sets.”

The newly open tool could be particularly useful for news organizations that would like to get into the dataviz game, but that don’t have the resources — of time, of talent, of money — to invest in proprietary systems. (The papers of the Journal Register Company, a news organization that has made a point of experimenting with free, web-based journalistic tools, comes to mind here — though any news outfit, big or small, could benefit.) The Public Data team had two main goals in opening up the Explorer tool to users, Yolken notes: Increasing the datasets available to be visualized and, then, distributing them. “First, we want to have lots of data sets available that are credible and useful and interesting,” he says. Second, the hope is that the tool’s embedding capabilities will allow for easy sharing of those data sets.

Though the Explorer platform is now open to anyone — and though Yolken and Benjelloun mention teachers and students as groups who might do some interesting experiments with it — they hope that journalists, in particular, will make use of the tool. Even more particularly: “data-driven journalists.”

To that end, the tool isn’t as intuitively understandable as, say, the awesomely easy Ngrams book viewer tool — “we realized that, in order to show the data properly, to make the data understandable, you really needed to describe the metadata,” Benjelloun notes — but nor does it require special expertise to use. “This format doesn’t require engineering skills,” Yolken says; then again, “it’s not as easy as a spreadsheet.” It’s somewhere in the middle — akin to learning, say, basic HTML. (Here’s more on how to use it.)

But if journos can get beyond the initial learning curve (one that, for data-driven journos, in particular, won’t be especially steep), they, and their readers, could benefit doubly. The Explorer tool allows users not just to create dynamic data visualizations, but also to avail themselves of a new way to understand those data in the first place. In other words: The tool could prove useful from both the presentation and the production ends of the journalistic spectrum. There’s something about watching data move over time, Yolken notes, that changes your perspective as a consumer of those data. “It makes you start asking questions that you wouldn’t have asked before.”

What to read next
John Wihbey    
When journalists factcheck politicians (or don’t), how to flag bad behavior on social media, and getting past slactivism: all that and more in this month’s roundup of the academic literature.