Nieman Foundation at Harvard
From the unbanked to the unnewsed: Just doing good journalism won’t be enough to bring back reader trust
ABOUT                    SUBSCRIBE
Sept. 29, 2009, 9 a.m.

Five projects on the frontier of text-based data analysis and visualization

Last week, I attended the Transparent Text symposium at IBM’s offices in Cambridge. The conference focused on text-based data storage, analysis, and visualization — awesomely nerdy stuff, in other words.

Some of the presentations would be familiar to loyal readers of this site: Amanda Michel’s distributed reporting at ProPublica, Ethan Zuckerman’s Media Cloud and “nutritional labeling” for news, DocumentCloud, and The Guardian’s crowdsourcing tool. Here, then, are five other projects that piqued my interest at the conference:


I’ve mentioned OpenCalais in the context of DocumentCloud, but there’s much more to the software, which was purchased by Thomson Reuters in 2007. In a sentence, OpenCalais parses text for names, locations, organizations, and other entities to make unstructured documents more useful. Oh, and it’s free.

Above are the slides presented by Tom Tague, head of OpenCalais, whose talk focused on how publishers are using the service. The best example is on the last slide: Two investigative-journalism networks, which Tague did not name, are using OpenCalais to compare birth, death, and wedding records with government contracts to identify conflicts of interest that wouldn’t be otherwise apparent.

IBM’s DeepQA project

IBM’s successor to Deep Blue, the chess-playing supercomputer that defeated Gary Kasparov, is DeepQA, a natural language processor that’s being trained to play Jeopardy. It’s a whole different challenge, the complexities of which were explained in a New York Times article last spring and in the IBM promotional video above.

What does this have to do with journalism? Nothing, at first, but the research behind DeepQA (or “Watson,” as they call it at IBM) could improve the way information is processed and interpreted — and hasn’t that long been the news industry’s specialty?


Medicare Prescription Drug Price Negotiation Act of 2007 (at

Center for Responsive Politics Medicare Prescription Drug Price Negotiation Act of 2007 (at

Maplight is a project funded primarily by the Sunlight Foundation that seeks to “illuminate” the connection between money and politics in California and the federal government. Their databases allow users to compare votes on particular bills with campaign funding from interest groups that supported or opposed the legislation. The widget above, for instance, demonstrates the correlation, if not causation, between contributions and votes on a Medicare bill in 2007.

IBM’s Many Eyes project

Many Eyes is IBM’s free data-visualization software. (I used it for two posts earlier this year.) Fernanda Viégas and Martin Wattenberg demonstrated some of their best text-based visualizations, like Word Tree, and previewed a new one that compares Google searches, pictured above comparing the most common endings of searches for “is my son…” and “is my daughter…” Think of it as an amped-up version of Google Suggest.

Linked data at The New York Times

I actually missed this presentation, but Alexis Lloyd of The New York Times Co.’s research and development group, which we profiled at length in May, discussed how the Times is using linked data to organize its content. ReadWriteWeb reported on this project in June. The slide above, for instance, illustrates how the Times classifies airline accidents to create a more-intelligent archive of its plane-crash coverage.

Slide photos by Andreas Myhrvold Braendhaugen and lite used under a Creative Commons license.

POSTED     Sept. 29, 2009, 9 a.m.
Show comments  
Show tags
Join the 35,000 who get the freshest future-of-journalism news in our daily email.
From the unbanked to the unnewsed: Just doing good journalism won’t be enough to bring back reader trust
Journalists see readers’ consumption decisions through the lens of quality. But that’s only a small part of what builds a connection between a news organization and an audience.
In West Virginia, a new project is going beyond the coal miner to tell a broader story of Appalachia
“Everyone’s talking to coal miners; we want to introduce you to somebody else that you’re not expecting to see.”
Newsonomics: Can Dutch import De Correspondent conquer the U.S.?
It’s built a membership-driven model that produces trust, connection, and good journalism. But can it extend that approach to the hurly-burly of the American media market?