Knight News Challenge: A grant to DocumentCloud promises a data boost for investigative journalism

By Zachary M. SewardJune 17, 2009  /  2:01 p.m.  

The Knight News Challenge’s biggest winner, with a two-year grant of $719,500, is DocumentCloud, the primary-source index conceived by journalists and developers at ProPublica and The New York Times. Here’s why you should care: There’s good reason to believe the project will transform how some investigative journalism is conducted — and who conducts it.

Like a lot of software in the cloud, this one is complicated to explain. I wrote a long overview of DocumentCloud in November, and you can read their initial grant application in my first post about the project. Aron Pilhofer, editor of interactive news technologies at the Times and one of the project’s creators, told me on Monday, “DocumentCloud isn’t really conducive to a two-minute elevator pitch.” But later in our conversation, he ventured one: “It will turn documents into data.”

In the analog version of investigative journalism, a reporter obtains documents from sources and freedom-of-information requests, writes a story, and… that’s it. If we’re lucky, the materials are posted as unwieldy and barely searchable PDFs.

DocumentCloud’s vision is to collect, archive, and index the text and metadata of all documents used by participating news organizations, advocacy groups, bloggers, and others — “so they’re not just sitting in the corner of a newsroom collecting dust,” Pilhofer explained. That way, anyone — from other news outlets to curious readers — will be able to search across all documents in the project to find information that might not have been relevant to the original piece. If it were an animated TV series, the catchphrase might be, With our newsrooms combined — we are DocumentCloud!

Early partners in the project include the Times, ProPublica (the non-profit investigative journalism outfit) Gotham Gazette (a New York City news site published by Citizens Union Foundation, themselves winners of two Knight News Challenge grants), TPM Muckraker (the investigative arm of Talking Points Memo), and the National Security Archive (home to the largest public repository of declassified government documents). Are you salivating yet?

Anyone who has waded through the National Security Archive’s wealth of FBI files and CIA reports will immediately recognize the benefit of DocumentCloud. What if you could search across the entire archive for a particular topic of interest (Pilhofer suggested Marilyn Monroe) and get pinged whenever that topic shows up in a new document? Or what if you were a business journalist up to your neck in SEC filings? Or a local blogger keeping tabs on your congressman’s earmarks?

Details are still being worked out, but running materials through DocumentCloud will involve some sort of optical character recognition (to render those pesky image-based PDFs favored by some government agencies into searchable text) and Open Calais (to extract metadata like names, locations, and dates for more effective indexing). The news organizations that contribute documents will likely host them as well, said Scott Klein, editor of online development at ProPublica and a creator of DocumentCloud. He described the project as more of a “card catalog” than, say, a repository.

There are other aspects of the project that I’d be happy to discuss in the comments, and maybe we can get Pilhofer and Klein to weigh in here as well (as though I’m not hyping them enough). The other creators are Eric Umansky of ProPublica and Ben Koski of the Times. So if you have any questions about DocumentCloud, feel free to ask, and I’ll work on getting answers.

UPDATE, 3:30 p.m.: The Document Cloud folks introduced their project at the Future of News and Civic Media Conference at MIT today, and I shot some raw video with Qik:


10 comments:

Trackbacks:

  1. Knight News Challenge announces a (smaller) slate of winners for 2009 » Nieman Journalism Lab at 2:16 pm, June 17, 2009

    [...] — DocumentCloud, $719,500 for a ProPublica/New York Times effort to open up the documents reporters and advocates use in their work. (Read more in Zach’s post.) [...]

     
  2. Have You Heard The News? Did You Get The Truth About It? « SUFFICIENTLY ADVANCED TECHNOLOGY at 9:34 pm, June 17, 2009

    [...] DocumentCloud, a project by the New York Times and the non-profit ProPublica, is creating an easily searchable, free, public online database of public records. [...]

     
  3. Investigative journalism on 18 June 09 « The Centre for Investigative Journalism News Blog at 7:31 am, June 18, 2009

    [...] Knight News Challenge: A grant to DocumentCloud promises a data boost for investigative journalism [...]

     
  4. links for 2009-06-18 « Amy G. Dala at 10:04 am, June 18, 2009

    [...] Knight News Challenge: A grant to DocumentCloud promises a data boost for investigative journalism

     
  5. Gary Kebbel on the Knight News Challenge: Repetitive ideas, tougher judges hurt some applicants » Nieman Journalism Lab at 12:05 pm, June 18, 2009

    [...] reasoning behind that? And also they were a few on the stage today — what is the locality for DocumentCloud? It seems like there’s some fudginess about how local some of the winners might [...]

     
  6. DocumentCloud is “turning documents int… « Paul M. Watson at 8:04 am, June 22, 2009

    [...] 12:04 pm on June 22, 2009 Permalink | Reply Tags: media (40) DocumentCloud is “turning documents into data” with a focus on all the research journalists do to [...]

     
  7. Knight News Challenge: A tool to push old stories to new media » Nieman Journalism Lab at 7:01 am, June 25, 2009

    [...] video or audio, Boydston said.) “You might think of the CMS uploader as poor man’s DocumentCloud,” Boydston said, “but only in that it facilitates the creation of an easy to use, [...]

     
  8. A Guardian crowdsourcing update » Nieman Journalism Lab at 8:25 pm, September 6, 2009

    [...] Aron Pilhofer of The New York Times (and DocumentCloud) notes the crucial role Amazon’s EC2 plays in projects like these: “Even more than a [...]

     
  9. DocumentCloud adds impressive list of investigative-journalism outfits » Nieman Journalism Lab at 8:12 am, September 24, 2009

    [...] co-founder, who was at the conference, to chat about what they’ve been up to since winning a two-year $719,500 grant from the Knight Foundation. That video is above, and I’ll add a [...]

     
  10. danielknox.net » Data / analytics focus at ONA09 at 9:17 pm, October 4, 2009

    [...] of DocumentCloud, the Knight News Challenge-funded investigative journalism tool that plans to “turn documents into data”. The ‘data’ part of DocumentCloud is powered by OpenCalais. The semantic processing [...]

     

Leave a comment

Check out these related posts