ProPublica and NYT seek $1M to put everyone’s documents online

By Zachary M. SewardNov. 2, 2008  /  1:17 p.m.  

[Saturday was the deadline for submissions for this year's Knight News Challenge. In the coming days and weeks, we'll be looking at some of the most interesting applicants. If you know of one you think worth highlighting, let us know, via email or in the comments. —Ed.]

Two of the biggest names in journalism have applied to this year’s Knight News Challenge: The pioneering investigative-reporting non-profit ProPublica and The New York Times are seeking $1 million from the Knight Foundation to launch an online repository of primary-source documents. The project could lead to greater information sharing among news organizations and their audience. As they put it in their grant application:

Documents are the foundation of investigative journalism, but today’s newsroom is a throwaway culture. Too often, reporters gather reams of information, do their stories, then chuck rich source documents into a dusty corner, never again to see the light of day.

The project, which is called DocumentCloud, would let news organizations upload their materials for public consumption and analysis. (“Readers will also be able to quickly search, annotate and bookmark documents — and for the first time link directly to specific pages or passages.”)

The proposal relies on a piece of software called DocViewer, which was developed by the Times’ Interactive Newsroom Technologies team. The head of that team, Aron Pilhofer, recently confirmed that the Times will release DocViewer as open source “sometime after the election.” Brian Boyer, the blogger who broke that news, said the software was created by the Times for its searchable database of Hillary Clinton’s 11,000-page public schedule as first lady, which was a journalistic marvel.

In an email today, Pilhofer said the application has already made it to the second round of the News Challenge, and he explained the proposal’s provenance:

The project started with a conversation between Scott Klein, Eric Umansky (of ProPublica) and me and my boss, Marc Frons. They were interested in using our DocViewer, and we were talking about the possibility of just open sourcing the darn thing. So, we got into one of those… “Hey, wouldn’t it be cool if we could also…” sorts of conversations, and things went from there.

DocumentCloud would focus initially on New York City “because it has favorable FOI laws and a vibrant journalism and blogging community.” (The community focus is also a requirement of the News Challenge.) A consortium of media outlets, bloggers, and watchdog groups would submit documents, though the application mentions only one partner on board: the Gotham Gazette, a news website published by the Citizens Union Foundation of the City of New York. ProPublica also plans to contribute state- and federal-government documents.

For the technically inclined, DocumentCloud will run on open APIs, so readers or other news organizations could search and interact with the document database as necessary for investigative projects. “Think of it as a ‘card catalog’ of standardized metadata for primary source documents,” the application argues.

It isn’t clear if the project could or would go ahead without funding from Knight, which will award its News Challenge grants next summer. ProPublica’s $10-million annual budget is funded primarily by the Sandler Foundation. We’ve sent an email to Mike Webb, ProPublica’s director of communications, seeking more information.

The full text of the grant application is below the jump.


Project Title: DocumentCloud

Requested amount from Knight News Challenge: $1,000,000

Expected amount of time to complete project: 3 [years]

Total cost of project including all sources of funding: $1,000,000

Describe your project: What is it? DocumentCloud is software, a website, and a set of open standards and APIs that will accelerate the daily work of investigative reporters, and will make investigative reporters out of every citizen, by improving the way we find, share, read and collaborate on source documents online. Why do we need it? Documents are the foundation of investigative journalism, but today’s newsroom is a throwaway culture. Too often, reporters gather reams of information, do their stories, then chuck rich source documents into a dusty corner, never again to see the light of day. Documents that are placed on the web are typically just PDFs — a poor user experience that places documents out of context and, often, out of reach when the story fades from public consciousness. Further, news outfits do not benefit from the wisdom of the crowd since there is no good way to collaboratively examine large document sets. How will it do it? DocumentCloud will host, and provide an open API to, an online database of source documents, contributed by a consortium of news orgs, watchdog groups and bloggers. Think of it as a “card catalog” of standardized metadata for primary source documents. Once submitted to DocumentCloud, documents can be found, linked to, and retrieved by anyone, anywhere on the Web. Thanks to the metadata, users will be able to search by topic, agency, or location. The project will lower barriers of participation by creating open standards and open-source software. DocViewer, a best-of-class web-based application, will allow even the smallest organizations to publish their documents online and contribute to DocumentCloud. Readers will also be able to quickly search, annotate and bookmark documents — and for the first time link directly to specific pages or passages.

How will your project improve the way news and information are delivered to geographic communities? Because source documents are often more scarce in metro reporting than in national, DocumentCloud can make its biggest initial impact by helping create an infrastructure for sharing on the local level. We’ve picked New York City for our initial rollout because it has favorable FOI laws and a vibrant journalism and blogging community. We have an agreement with our first local partner, Gotham Gazette, to work with us to build and test the software and APIs. They’ll also join the consortium, which will grow over the period of the grant to include many other local and national news organizations, bloggers and watchdog groups. While our pilot will focus on New York City, we will also include source documents from state and federal governments.

How is your idea innovative? (new or different from what already exists) DocumentCloud will take source documents beyond the inherent constraints of the PDF and out of the realm of clumsy scans or external application plug-ins and for the first time make them an intrinsic part of the semantic Web, and a part of reporting news online. Sharing information becomes much easier when you can share specific pages or paragraphs as well as entire documents. Source documents will be easier to find because users can search through fielded metadata, such as topics, locations, people, government agencies, publication date and other variables. Though of course the project stands on the shoulders of initiatives like Brewster Kahle’s Internet Archive, as well as the Open Archives Initiative, nothing like DocumentCloud exists.

What experience do you or your organization have to successfully develop this project? The New York Times has been at the forefront of the industry by fully integrating its newsroom and digital operations, and a leading innovator for digital content on the web among other platforms. In the past year, The Times has developed and launched a number of innovative products, including the Times People social network, Times Machine, an iPhone Times reader, the Times Developer Network, the Visualization Lab and two APIs. The Times is among the only major media organizations to form a dedicated team of journalist/developers focused exclusively on news projects, including the paper’s extensive Olympics and elections coverage this year. This team, Interactive Newsroom Technologies, has already built a lightweight version of the DocViewer, which will be released as an open source project. ProPublica, the new, non-profit newsroom, has the largest team of reporters dedicated to investigative journalism anywhere in the country. It is uniquely qualified to help manage the effort, not only because its reporters could be “power-users” of the service but because it was organized to take on just this kind of effort — collaborating with newsrooms around the country. ProPublica has already partnered with new organizations including Newsweek, 60 Minutes, Politico, the Albany Times-Union, the Los Angeles Times. Unlike most news organizations, ProPublica does not have an economic incentive to be competitive with other news organizations — in fact, just the opposite. Its model relies on just the kind of collaboration that will help spread DocumentCloud virally.

[Update: See Jay Rosen's concerns on this application here.]

[Hello, readers from Romenesko, and welcome to the newly launched Nieman Journalism Lab. We hope you'll come back every weekday for reporting, commentary, and conversation about the future of journalism. Here's our front page, here's more about us, and here's our RSS feed.]

This entry was written by Zachary M. Seward, posted on November 2, 2008 at 1:17 pm, and tagged , , , , , , , , . Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback.


28 comments:

  1. Fred Howell at 6:53 am, November 3, 2008

    For online annotation & collaboration on PDFs / Word docs in the browser there’s also A.nnotate.com (our service) – some other tools like google and amazon book search also use the technique of rendering documents as images to make them render quickly on the web without waiting for the whole document to download and separate viewer plugins to start up.

    A number of flash-based document sites have appeared recently too – like scribd / docstoc / edocr – but these don’t do much more than display the document in a flash panel, which doesn’t seem like such a big win over having a link to a pdf.

     
  2. David Poulson at 11:24 am, November 3, 2008

    Intriguing concept. Puzzling how it has already made it to the second round when the deadline was just Saturday.

     
  3. David Poulson at 11:32 am, November 3, 2008

    With an invitation like that, I’m sure you’ll be flooded by applicants looking for exposure. But I think we have nifty idea. It ties state level campaign finance reports with the votes of state lawmakers and with bill analyses. It emphasizes the impact of multi-state political contributions on regional environmental policy. See:
    http://tinyurl.com/5heo6r

     
  4. Gabriel Sama at 11:27 am, November 3, 2008

    I sent a somewhat similar proposal, although as an individual mine is not as complete as the Times one. I wonder how the judges will tackle ideas that overlap. Mine is called Acceso (Access in Spanish) and targets Mexico City.

    http://tinyurl.com/5w5j8o

    My other two ideas that are still in competition are:
    News-Point: http://tinyurl.com/6dwagg
    Crimesourcing: http://tinyurl.com/6hvkf4
    Hope to read your comments. Thanks,
    Gabriel Sama

     
  5. Anna at 1:22 pm, November 10, 2008

    I wonder how this would compare&contrast to the UCSF tobacco documents archive. If it’s just duplicating effort, bad. But if it’d make a job like that much easier…
    (Here’s hoping we get to put many more such document sets online in future.)

     
  6. Anna at 10:53 pm, November 10, 2008

    Does ProPublica editor-in-chief Paul Steiger’s Knight Foundation trusteeship constitute a conflict of interest? How about Knight Foundation president & CEO Alberto Ibargüen’s position on the ProPublica board?

    And while we’re on the topic of possibly odd arrangements, what’s the deal with NewsU offering incentives to its users to write testimonials (“We’re giving away prizes for the best stories that are submitted”), to use in order to get more funding? Is this a legitimate tactic?

     
  7. Anna Haynes at 5:34 pm, December 6, 2008

    Just ran across something from Ask Metafilter, that might be relevant -

    http://ask.metafilter.com/108087/Examples-of-Onlne-Archives-that-Allow-Users-to-Add-Metadata

    (and I wonder why blogging platforms don’t segregate trackbacks from comments, or somehow make it possible to hide the trackbacks…)

     

Trackbacks:

  1. Andrew Golis » Blog Archive » links for 2008-11-02 at 5:00 pm, November 2, 2008

    [...] ProPublica seeks $1M to put everyone’s documents online ProPublica wants to fund a big doc dump. Good use of $$, smart journalism. NYTs API meets Sunlight Foundation. (tags: new.media pro.publica journalism non.profit transparency) [...]

     
  2. Defining who the Knight News Challenge is for » Nieman Journalism Lab » Pushing to the Future of Journalism at 10:54 pm, November 2, 2008

    [...] Jay Rosen doesn’t seem too thrilled by the ProPublica/NYT application for the Knight News Challenge — at least based on his Twittering [...]

     
  3. Six newspapers in Ohio to drop their Monday print editions/NYT & ProPublica apply for Knight News Challenge $ « Media history…in the making at 1:15 pm, November 3, 2008

    [...] point of interest: ProPublica and NYT seek $1M to put everyone’s documents online through the Knight News Challenge. I am surprised to see such a well-known name apply for this [...]

     
  4. Build the Echo » Blog Archive » links for 2008-11-04 at 10:05 am, November 4, 2008

    [...] ProPublica and NYT seek $1M to put everyone’s documents online » Nieman Journalism Lab » Pushing… Document Cloud: an experiment worth keeping an eye on (tags: buildtheecho impact rebirth_of_journalism) [...]

     
  5. NYT and ProPublica seek $1 million to put news source docs online | The Current Buzz - Tech at 3:00 pm, November 4, 2008

    [...] November 4th, 2008 | Fun Tech Snip from a Nieman Journalism Lab blog post about an interesting application in this year’s Knight News Challenge: The pioneering [...]

     
  6. links for 2008-11-04 : Gerard Barberi at 6:04 pm, November 4, 2008

    [...] ProPublica and NYT seek $1M to put everyone’s documents online » Nieman Journalism Lab » Pushing… I don't think believe those "innovations" listed by NYT are very innovative. Annotated link http://www.diigo.com/bookmark/http%3A%2F%2Fwww.niemanlab.org%2F2008%2F11%2Fpropublica-seeks-1m-to-put-everyones-documents-online (tags: myblog journalism nytimes knight_news) [...]

     
  7. Links 11/05/2008 : Gerard Barberi at 7:59 pm, November 4, 2008

    [...] ProPublica and NYT seek $1M to put everyone’s documents online » Nieman Journalism Lab » Pushing… [...]

     
  8. Opera Tronickss » Blog Archive at 7:52 am, November 5, 2008

    [...] ProPublica and NYT seek $1M to put everyone’s documents online (Nieman Lab)… Snip from a Nieman Journalism Lab blog post about an interesting application in this year’s Knight News Challenge: The pioneering [...]

     
  9. Running Design - Friendly and Speedy New York City Web Shop » Blog Archive » Live Blog: Changing Media Landscape at Columbia at 7:26 pm, November 11, 2008

    [...] Blog: Changing Media Landscape at Columbia 7:23 p.m. Sewell: Mentions Knight News Challenge Grant in conjunction with ProPublica and Clay Shirky’s book, Here Comes [...]

     
  10. LSDI : Document Cloud, un archivio per i documenti a rischio di oblio at 2:26 am, November 13, 2008

    [...] progetto – spiega un articolo del Nieman Journalism Lab – parte dalla considerazione che: i documenti sono la base del giornalismo investigativo, ma [...]

     
  11. LediNews » Document Cloud, un archivio per i documenti a rischio di oblio at 6:46 am, November 13, 2008

    [...] condivisione delle informazioni fra redazioni e lettori. Il progetto – spiega un articolo del Nieman Journalism Lab – parte dalla considerazione che: i documenti sono la base del giornalismo investigativo, ma le [...]

     
  12. DocumentCloud: The innovation $1m in Knight money could buy » Nieman Journalism Lab » Pushing to the Future of Journalism at 8:38 am, November 19, 2008

    [...] some more information about the Knight News Challenge application by ProPublica and The New York Times that generated some buzz and criticism earlier this month. [...]

     
  13. Dadblog » links for 2008-11-27 at 10:03 am, November 27, 2008

    [...] ProPublica and NYT seek $1M to put everyone’s documents online » Nieman Journalism Lab » Pushing… Describe your project: What is it? DocumentCloud is software, a website, and a set of open standards and APIs that will accelerate the daily work of investigative reporters, and will make investigative reporters out of every citizen, by improving the way we find, share, read and collaborate on source documents online. (tags: nytimes media journalism innovation data) [...]

     
  14. links for 2008-11-27 | I’ve Said Too Much at 6:39 am, November 28, 2008

    [...] ProPublica and NYT seek $1M to put everyone’s documents online » Nieman Journalism Lab » Pushing… Describe your project: What is it? DocumentCloud is software, a website, and a set of open standards and APIs that will accelerate the daily work of investigative reporters, and will make investigative reporters out of every citizen, by improving the way we find, share, read and collaborate on source documents online. (tags: nytimes media journalism innovation data) This entry was posted on Thursday, November 27th, 2008 at 4:03 pm . You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site. [...]

     
  15. Resources for documentary research, or what the Old Pros can teach New Media « UMass Journalism Professors Blog at 3:31 pm, December 4, 2008

    [...] analysis by others. It’s about three years away. (A full explanation and their grant proposal is here. More on it [...]

     
  16. Chat wrap-up: College newspaper collaboration – Innovation in College Media at 2:02 am, February 9, 2009

    [...] New York Times and ProPublica are looking into doing something similar through DocumentCloud, which would be a place for reporters to store documents they gather during [...]

     
  17. A deep breath… - okiejournalist.net at 7:55 pm, February 23, 2009

    [...] here. Combining that with what Aron Pilhofer and his crew at The Times are doing with ProPublica to make DocumentCloud, I think there’s a ton of new possibilities out [...]

     
  18. Live Blog: Changing Media Landscape at Columbia University | News Startup at 12:48 am, May 14, 2009

    [...] p.m. Sewell: Mentions Knight News Challenge Grant in conjunction with ProPublica and Clay Shirky’s book, Here Comes [...]

     
  19. Knight News Challenge: A grant to DocumentCloud promises a data boost for investigative journalism » Nieman Journalism Lab at 2:02 pm, June 17, 2009

    [...] long overview of DocumentCloud in November, and you can read their initial grant application in my first post about the project. Aron Pilhofer, editor of interactive news technologies at the Times and one of [...]

     
  20. DocumentCloud adds impressive list of investigative-journalism outfits » Nieman Journalism Lab at 8:10 am, September 24, 2009

    [...] souped-up repository of primary-source material that I’ve been raving about since it first emerged in November, has a big announcement today: They’ve signed up 20 more organizations — [...]

     
  21. What Happens to Reporters’ Primary Sources in a Digital World? « Predicate, LLC | Editorial + Content Strategy at 9:40 am, October 31, 2009

    [...] Documents are the foundation of investigative journalism, but today’s newsroom is a throwaway culture. Too often, reporters gather reams of information, do their stories, then chuck rich source documents into a dusty corner, never again to see the light of day.via ProPublica and NYT seek $1M to put everyone’s documents online » Nieman Journalism Lab … [...]

     

Leave a comment

Check out these related posts