HOME
          
LATEST STORY
The New Inquiry: Not another New York literary magazine
ABOUT                    SUBSCRIBE
Nov. 19, 2008, 8:36 a.m.

DocumentCloud: The innovation $1m in Knight money could buy

Here’s some more information about the Knight News Challenge application by ProPublica and The New York Times that generated some buzz and criticism earlier this month. They’re seeking a $1 million grant to develop an online repository of primary-source documents that anyone could contribute to or take from. I spoke at length with developers at both organizations, and they discussed the technology behind their effort, how it could benefit investigative journalism, and why they’re seeking seven figures to launch the project.

The venture, which is called DocumentCloud, seems like it could vastly improve document-based journalism. (That’s separate from the issue of whether they’re deserving of a News Challenge grant.) At the moment, when a reporter gets her hands on paper documents, the best she can typically do is post them online as scanned PDFs, where they often can’t be searched and will likely be forgotten by the end of the day. Worst of all, it’s a one-sided experience: The reporter drops a dead tree in a forest and has no idea if it ever makes a sound.

DocViewer, which is the technology behind DocumentCloud, promises several features that would address the current failings of the PDF model. It would allow users to run their documents through an OCR (optical character recognition) service that would enable full-text searches of otherwise impenetrable material. Then DocViewer relies on OpenCalais, a web service developed by Thomson Reuters, which can tag documents with the names of known people and places found within the text. Any reporter who has ever attempted to wade through a thick stack of paper on deadline will immediately realize how helpful this would be.

“The problem we’re trying to solve here is the problem that TPM Muckraker had when they got thousands of pages of attorney general documents, and then just sort of threw it up online and said, ‘Take a look through this,'” said Aron Pilhofer, editor of interactive news technology at the Times. That effort, which won a Polk Award, broke new ground in crowdsourced journalism — a topic, incidentally, that we’re discussing in this month’s Lab Book Club. (And the TPM Muckraker blogger who posted those docs, Paul Kiel, now works for ProPublica.)

But the process wasn’t perfect. TPM readers had to navigate large PDF files and post their observations in the comments section of a blog post, which was helpful in the moment but limited in its long-term usefulness. “Those comments become more than just comments,” Pilhofer said. “They become actual data.”

DocumentCloud seeks to make the most of such data by allowing journalists and readers to annotate documents for all to see and benefit. Think of it as highlighting for the crowd. Pilhofer said the current proof of concept for DocViewer includes an annotation feature that’s similar to the notes users can leave on photographs in Flickr. Users will also be able to link directly to specific pages or even phrases in a document.

To get a sense of DocumentCloud’s potential, take a look at the database of Guantánamo Bay detainees that the Times made public on Nov. 3, when it was accompanied by a 1,500-word story. Each record is linked to relevant government documents that have been made public since “enemy combatants” were first held there in 2002. Pilhofer said the database isn’t using a full-featured version of DocViewer, but it certainly demonstrates the benefit of browsing documents grouped by subject rather than, say, the order in which the Defense Department happened to release them. What’s remarkable about the Gitmo collection, aside from its massive scope, is that the Times has offered up this information at all. As Pilhofer said, “It’s not usually in a newsroom’s DNA to release something like that to the public — and not just the public, the competition, too.”

Scott Klein, the director of online development for ProPublica, said that sharing — a maxim of the Internet, if not of newsrooms — would be the real power of DocumentCloud. The objective, he said, is to maximize the work of collecting documents that’s already been done on a particular topic and allow other journalists to build from there. “How can we collect those documents so the next reporter doing a story on this subject can find this information and use it and display it in a much more satisfying way?” he said.

ProPublica and the Times are asking the Knight Foundation for $1 million over three years to cover their anticipated costs. Klein said expenses would include staff to facilitate the program as well as hosting and bandwidth costs. I asked Pilhofer to respond to criticism of their application leveled by NYU’s Jay Rosen, who suggested that the for-profit Times Company shouldn’t be seeking foundation grants for its journalism. Here’s what Pilhofer said:

I can understand why some would feel that way, but I think it’s more a misunderstanding of what the project is and who it’s intended for…This is a grant submitted by us, but it’s not for us…The project is to create what we’re calling a consortium, some sort of entity that is not The New York Times, that is not ProPublica. Ideally, this will incorporate all sorts of media organizations and bloggers and watchdog groups and universities…If anything, Professor Rosen has it kind of backwards: We’re contributing to this effort. We’re contributing development resources, we’re contributing our time.

Obviously, I’m a fan of DocumentCloud and hope it sees the light of day. But whether they should receive a Knight grant is another question and depends, as my boss Josh asked, on whom the News Challenge is for. Based on the comments at my original post and around the web, it seems like DocumentCloud has generated some resentment among other News Challenge applicants more desperate for funding. One commenter also questioned whether ProPublica’s editor-in-chief, Paul Steiger, has an unfair advantage because he sits on the board of Knight, whose CEO, Alberto Ibargüen, is on the board of ProPublica. That web of ties could certainly help DocumentCloud’s application.

But what will help the project most is that it’s a good idea. And having waded through many News Challenge applications this month, I’ve seen that there’s truly a shortage of good ideas — or, at least, ones with clear potential to immediately improve journalism on a broad level. Kristen Taylor, Knight’s online community manager, said as much to me when she visited Cambridge in October. So while $1 million is a lot of money — a fifth of what Knight has committed to spend on News Challenge projects this year — but I’d bet that much cash that DocumentCloud will be one of the winners when they’re announced next fall.

POSTED     Nov. 19, 2008, 8:36 a.m.
PART OF A SERIES     Knight News Challenge 2009
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
The New Inquiry: Not another New York literary magazine
For New Inquiry publisher Rachel Rosenfelt, building cultural significance was easy — building a sustainable business is the hard part.
iOS 8: How 5 news orgs have updated their apps for Apple’s new operating system
ABC, the AP, Breaking News, The Guardian, and The New York Times have all updated apps (or introduced new ones) to take advantage of new features on iOS 8.
How the new Wall Street Journal iPad app is taking advantage of new features in iOS 8
The app, released with the operating system today, has more functionality in notifications and lets users continue reading articles across Apple devices.
What to read next
749
tweets
How a Norwegian public radio station is using Snapchat to connect young listeners with news
“A lot of people check their phones before they get out of the bed in the morning, and they check social media before the news sites.”
724When it comes to chasing clicks, journalists say one thing but feel pressure to do another
Newsroom ethnographer Angèle Christin studied digital publications in France and the U.S. in order to compare how performance metrics influence culture.
691Wearables could make the “glance” a new subatomic unit of news
“The audience wants to go faster. This can’t be solved with responsive design; it demands an original approach, certainly at the start.”
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Fuego is our heat-seeking Twitter bot, tracking the links the future-of-journalism crowd is talking about most on Twitter.
Here are a few of the top links Fuego’s currently watching.   Get the full Fuego ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Yahoo
Hechinger Report
Gotham Gazette
Baristanet
Apple
Crosscut
Foursquare
NPR
Seattle PostGlobe
Voice of San Diego
The Atlantic
Mozilla