HOME
          
LATEST STORY
Newsonomics: BuzzFeed and The New York Times play Facebook’s ubiquity game
ABOUT                    SUBSCRIBE
June 17, 2009, 2:01 p.m.

Knight News Challenge: A grant to DocumentCloud promises a data boost for investigative journalism

The Knight News Challenge‘s biggest winner, with a two-year grant of $719,500, is DocumentCloud, the primary-source index conceived by journalists and developers at ProPublica and The New York Times. Here’s why you should care: There’s good reason to believe the project will transform how some investigative journalism is conducted — and who conducts it.

Like a lot of software in the cloud, this one is complicated to explain. I wrote a long overview of DocumentCloud in November, and you can read their initial grant application in my first post about the project. Aron Pilhofer, editor of interactive news technologies at the Times and one of the project’s creators, told me on Monday, “DocumentCloud isn’t really conducive to a two-minute elevator pitch.” But later in our conversation, he ventured one: “It will turn documents into data.”

In the analog version of investigative journalism, a reporter obtains documents from sources and freedom-of-information requests, writes a story, and… that’s it. If we’re lucky, the materials are posted as unwieldy and barely searchable PDFs.

DocumentCloud’s vision is to collect, archive, and index the text and metadata of all documents used by participating news organizations, advocacy groups, bloggers, and others — “so they’re not just sitting in the corner of a newsroom collecting dust,” Pilhofer explained. That way, anyone — from other news outlets to curious readers — will be able to search across all documents in the project to find information that might not have been relevant to the original piece. If it were an animated TV series, the catchphrase might be, With our newsrooms combined — we are DocumentCloud!

Early partners in the project include the Times, ProPublica (the non-profit investigative journalism outfit) Gotham Gazette (a New York City news site published by Citizens Union Foundation, themselves winners of two Knight News Challenge grants), TPM Muckraker (the investigative arm of Talking Points Memo), and the National Security Archive (home to the largest public repository of declassified government documents). Are you salivating yet?

Anyone who has waded through the National Security Archive’s wealth of FBI files and CIA reports will immediately recognize the benefit of DocumentCloud. What if you could search across the entire archive for a particular topic of interest (Pilhofer suggested Marilyn Monroe) and get pinged whenever that topic shows up in a new document? Or what if you were a business journalist up to your neck in SEC filings? Or a local blogger keeping tabs on your congressman’s earmarks?

Details are still being worked out, but running materials through DocumentCloud will involve some sort of optical character recognition (to render those pesky image-based PDFs favored by some government agencies into searchable text) and Open Calais (to extract metadata like names, locations, and dates for more effective indexing). The news organizations that contribute documents will likely host them as well, said Scott Klein, editor of online development at ProPublica and a creator of DocumentCloud. He described the project as more of a “card catalog” than, say, a repository.

There are other aspects of the project that I’d be happy to discuss in the comments, and maybe we can get Pilhofer and Klein to weigh in here as well (as though I’m not hyping them enough). The other creators are Eric Umansky of ProPublica and Ben Koski of the Times. So if you have any questions about DocumentCloud, feel free to ask, and I’ll work on getting answers.

UPDATE, 3:30 p.m.: The Document Cloud folks introduced their project at the Future of News and Civic Media Conference at MIT today, and I shot some raw video with Qik:

POSTED     June 17, 2009, 2:01 p.m.
PART OF A SERIES     Knight News Challenge 2009
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Newsonomics: BuzzFeed and The New York Times play Facebook’s ubiquity game
The ubiquity game has different rules for digital startups than for legacy businesses. But for both, figuring out the right relationship with Facebook is key to their audience strategies.
Jeff Israely: Good content marketing benefits from a smart publisher’s touch
Our startup correspondent, building Worldcrunch in Paris, on the thinking behind its operation’s pivot: “The smart brands know they’ll lose your attention if they use this new publishing power simply to push their merchandise.”
How a hobby foreign affairs blog became a paywalled news destination — and a business
World Politics Review has grown from one man’s side project to a small news operation supported by a niche paywall.
What to read next
2481
tweets
Millennials say keeping up with the news is important to them — but good luck getting them to pay for it
The new report from the Media Insight Project looks at millennials’ habits and attitudes toward news consumption: “I really wouldn’t pay for any type of news because as a citizen it’s my right to know the news.”
926The next stage in the battle for our attention: Our wrists
News companies have moved from print dollars to digital dimes to mobile pennies. Now, with the highly anticipated launch of the Apple Watch, the screens are getting even smaller. How are smart publishers thinking about the right way to serve users and maintain their attention on smartwatches?
705A wave of distributed content is coming — will publishers sink or swim?
Instead of just publishing to their own websites, news organizations are being asked to publish directly to platforms they don’t control. Is the hunt for readers enough to justify losing some independence?
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Creative Commons
Animal Político
Daily Kos
The New York Times
Davis Wiki
Tumblr
Foreign Policy
International Consortium of Investigative Journalists
Flipboard
Semana
Fox News
Gotham Gazette