Nieman Foundation at Harvard
HOME
          
LATEST STORY
Seeking “innovative,” “stable,” and “interested”: How The Markup and CalMatters matched up
ABOUT                    SUBSCRIBE
Sept. 29, 2009, 9 a.m.

Five projects on the frontier of text-based data analysis and visualization

Last week, I attended the Transparent Text symposium at IBM’s offices in Cambridge. The conference focused on text-based data storage, analysis, and visualization — awesomely nerdy stuff, in other words.

Some of the presentations would be familiar to loyal readers of this site: Amanda Michel’s distributed reporting at ProPublica, Ethan Zuckerman’s Media Cloud and “nutritional labeling” for news, DocumentCloud, and The Guardian’s crowdsourcing tool. Here, then, are five other projects that piqued my interest at the conference:

OpenCalais

I’ve mentioned OpenCalais in the context of DocumentCloud, but there’s much more to the software, which was purchased by Thomson Reuters in 2007. In a sentence, OpenCalais parses text for names, locations, organizations, and other entities to make unstructured documents more useful. Oh, and it’s free.

Above are the slides presented by Tom Tague, head of OpenCalais, whose talk focused on how publishers are using the service. The best example is on the last slide: Two investigative-journalism networks, which Tague did not name, are using OpenCalais to compare birth, death, and wedding records with government contracts to identify conflicts of interest that wouldn’t be otherwise apparent.

IBM’s DeepQA project

IBM’s successor to Deep Blue, the chess-playing supercomputer that defeated Gary Kasparov, is DeepQA, a natural language processor that’s being trained to play Jeopardy. It’s a whole different challenge, the complexities of which were explained in a New York Times article last spring and in the IBM promotional video above.

What does this have to do with journalism? Nothing, at first, but the research behind DeepQA (or “Watson,” as they call it at IBM) could improve the way information is processed and interpreted — and hasn’t that long been the news industry’s specialty?

Maplight

Medicare Prescription Drug Price Negotiation Act of 2007 (at MAPLight.org)

Center for Responsive Politics Medicare Prescription Drug Price Negotiation Act of 2007 (at MAPLight.org)

Maplight is a project funded primarily by the Sunlight Foundation that seeks to “illuminate” the connection between money and politics in California and the federal government. Their databases allow users to compare votes on particular bills with campaign funding from interest groups that supported or opposed the legislation. The widget above, for instance, demonstrates the correlation, if not causation, between contributions and votes on a Medicare bill in 2007.

IBM’s Many Eyes project

Many Eyes is IBM’s free data-visualization software. (I used it for two posts earlier this year.) Fernanda Viégas and Martin Wattenberg demonstrated some of their best text-based visualizations, like Word Tree, and previewed a new one that compares Google searches, pictured above comparing the most common endings of searches for “is my son…” and “is my daughter…” Think of it as an amped-up version of Google Suggest.

Linked data at The New York Times

I actually missed this presentation, but Alexis Lloyd of The New York Times Co.’s research and development group, which we profiled at length in May, discussed how the Times is using linked data to organize its content. ReadWriteWeb reported on this project in June. The slide above, for instance, illustrates how the Times classifies airline accidents to create a more-intelligent archive of its plane-crash coverage.

Slide photos by Andreas Myhrvold Braendhaugen and lite used under a Creative Commons license.

POSTED     Sept. 29, 2009, 9 a.m.
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
Seeking “innovative,” “stable,” and “interested”: How The Markup and CalMatters matched up
Nonprofit news has seen an uptick in mergers, acquisitions, and other consolidations. CalMatters CEO Neil Chase still says “I don’t think we’ve seen enough yet.”
“Objectivity” in journalism is a tricky concept. What could replace it?
“For a long time, ‘objectivity’ packaged together many important ideas about truth and trust. American journalism has disowned that brand without offering a replacement.”
From shrimp Jesus to fake self-portraits, AI-generated images have become the latest form of social media spam
Within days of visiting the pages — and without commenting on, liking, or following any of the material — Facebook’s algorithm recommended reams of other AI-generated content.