Nieman Foundation at Harvard
HOME
          
LATEST STORY
Newsweek is making generative AI a fixture in its newsroom
ABOUT                    SUBSCRIBE
July 27, 2017, 9 a.m.
Reporting & Production

DocumentCloud will start asking some users to chip in as it leaves IRE for its own nonprofit

“We need to address the sustainability question — like now — and we can’t wait any longer to do it.”

For six years, DocumentCloud has enabled journalists to upload, annotate, organize, and share primary source files with readers and embed them into articles. They’ve also been doing it free of charge, for everyone.

But for some users, that’s about to change.

With just one lead developer, DocumentCloud holds about 60 million pages of 3.6 million documents, stored on 31 servers by 8,000 accounts. Some news organizations have uploaded more than 300,000 documents in the eight years of its existence. For a nonprofit startup with no tangible revenue in a journalism world increasingly reliant on data, documents, and cloud storage, DocumentCloud’s supporters realized this model was unsustainable.

“There was a very real possibility that DocumentCloud would have just simply gone away. It’s not now, thank god, but I think that was a significant wakeup call for everybody,” said Aron Pilhofer, a cofounder of DocumentCloud. “We need to address the sustainability question — like now — and we can’t wait any longer to do it.”

Formerly the executive editor of digital at The Guardian and interactive editor at The New York Times, Pilhofer joined Temple University as journalism innovation professor almost a year ago. Now he’s taking DocumentCloud over again as it transitions out of its long-time home at the Investigative Reporters and Editors and into an independent nonprofit that will operate in collaboration with Temple as of August 1. (There should be no interruption in service to DocumentCloud users.) The Knight Foundation (disclosure: also a Nieman Lab funder) is providing a $250,000 grant to cover the transition as DocumentCloud finds its footing.

First thing on the agenda? “We said from Day One that at some point we would ask news organizations who use DocumentCloud to support DocumentCloud. We’ve always said that. We just have never gotten around to actually doing it,” Pilhofer said. With the grant, “the singular objective that we have is to make DocumentCloud sustainable — forever.”

DocumentCloud has never been a for-profit venture. Its humble beginnings were backed by a two-year, $700,000 Knight News Challenge grant to The New York Times and ProPublica, when Pilhofer collaborated with ProPublica’s now-deputy managing editor Scott Klein to pitch the idea as the future of document-based journalism. Former Nieman Lab staffer Zach Seward described the problem the team was trying to solve in 2008, barely three weeks into Nieman Lab’s existence:

At the moment, when a reporter gets her hands on paper documents, the best she can typically do is post them online as scanned PDFs, where they often can’t be searched and will likely be forgotten by the end of the day. Worst of all, it’s a one-sided experience: The reporter drops a dead tree in a forest and has no idea if it ever makes a sound.

DocViewer, which is the technology behind DocumentCloud, promises several features that would address the current failings of the PDF model. It would allow users to run their documents through an OCR (optical character recognition) service that would enable full-text searches of otherwise impenetrable material. Then DocViewer relies on OpenCalais, a web service developed by Thomson Reuters, which can tag documents with the names of known people and places found within the text. Any reporter who has ever attempted to wade through a thick stack of paper on deadline will immediately realize how helpful this would be.

DocumentCloud signed up 20 investigative journalism outlets as a consortium of testers in 2009 and became part of IRE two years later. While the partnership was called a “win-win” by the Knight Foundation at the time, it wasn’t exactly a perfect match. Pilhofer said this was brought to his attention when he moved back to the United States from London last November.

“While IRE had been in many ways a good host for DocumentCloud — in terms of IRE being sort of the core audience — it was clear to everybody that IRE wasn’t actually the right place for DocumentCloud,” Pilhofer acknowledged. “IRE isn’t set up to run a technology platform.”

Doug Haddix, IRE executive director, said the board of directors voted unanimously in favor of the transfer. In a statement, he said:

IRE has full confidence in Aron’s leadership and technical expertise to continue enhancing DocumentCloud as a critical tool for investigative journalism. Aron and I have worked closely together to ensure a smooth transition, with no disruption in service or features for the journalists who rely on DocumentCloud.

During IRE’s stewardship, DocumentCloud has dramatically expanded the service’s technical capabilities, added numerous features and optimized it for mobile devices. Journalists have uploaded more than 3 million documents comprising an estimated 52 million pages.

Moving forward, IRE trainers will continue to promote DocumentCloud as an essential service for journalists — especially its powerful tools for deep analysis of documents.

IRE is grateful to the Knight Foundation for its financial support of DocumentCloud and its endorsement of this new home for the service.

Three people are listed on DocumentCloud’s website as IRE employees, but lead developer Ted Han will be the only staff member to carry on with the project, at least initially. Pilhofer said he personally won’t be taking a salary from the nascent nonprofit as the “bare bones operation” of DocumentCloud adjusts — and starts asking heavy users to pitch in. The team is looking to ramp up staff numbers, build out features to help reporters verify documents, and defray the costs of those 30-plus servers as DocumentCloud continues to grow globally.

“You can’t just put stuff up in a cloud, turn a key, and walk away. You need to have a team working on it to maintain it and keep going, but also to respond to changes,” Pilhofer said, though he noted DocumentCloud’s platform is still “rock solid” and secure. These changes include both technology and scale: “If we have a bunch of documents sitting in servers on S3 in Virginia and you’re trying to access a document in Australia, you might as well get a sandwich before the document is going to load for you.”

The 501(c)3 nonprofit will invite participation from students at Temple University, though it isn’t technically a program of Temple. It won’t receive distinct financial support aside from Pilhofer’s time as an in-kind donation and other resource needs as they pop up. (Remember, the goal of this move is sustainability through DocumentCloud itself.) Pilhofer will lead as executive director and the nonprofit will have board members from DocumentCloud’s past, with cofounder Scott Klein signing on, and its future, with representation from Temple.

“With so many journalists now becoming entrepreneurs, product managers, designers, technologists…[it is] an absolute gift [for Temple students] to be able to learn within a real-world laboratory like DocumentCloud, a platform used by journalists around the world every day,” said Pilhofer, who came to Temple as the first professor in a $2 million endowed chair of journalism innovation at its school of media and communication. “There are lots and lots of labs out there doing very interesting theoretical work on how technology can improve the practice of journalism. Temple will have something 1,000 times better: a production platform that has already scaled to thousands of newsrooms.”

While Knight will provide the initial funding, DocumentCloud’s long-term revenue strategy is currently two-fold. One part involves some news organizations and individual journalists contributing in a tiered subscription-like service. The details are far from finalized, but Pilhofer emphasized that it would be “insanely affordable,” especially compared to the potential expenses of developing and running one’s own journalism-focused document storage system. He also noted that DocumentCloud will always maintain the option of a free account and that news organizations will never have to pay for the documents they make public.

“When you’re talking about a journalist using DocumentCloud once a week uploading a document or two, those people are not going to be impacted at all. If they want to contribute to us, great,” he said. “But if you’re uploading 40,000 documents on a Thursday…we also think that’s fair to ask those organizations to help support what we’re trying to do here.”

The second part turns to sponsorship from other entities — he identified platforms such as Google and Facebook as possibilities, along with “big media companies” — to help the nonprofit break even.

Despite all the changes, Pilhofer wants to reassure users that DocumentCloud will stay true to its open source, transparency-in-journalism roots.

“DocumentCloud was a great platform to help journalists do a thing, but what we actually wanted to change was to make journalism more transparent, full stop…to get journalists to show their work,” he said. “DocumentCloud is one of the few platforms out there where we can tangibly make an impact on how people perceive and trust journalism.”

Obligatory cloud photo by Pattys-photos used under a Creative Commons license.

POSTED     July 27, 2017, 9 a.m.
SEE MORE ON Reporting & Production
Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
Newsweek is making generative AI a fixture in its newsroom
The legacy publication is leaning on AI for video production, a new breaking news team, and first drafts of some stories.
Rumble Strip creator Erica Heilman on making independent audio and asking people about class
“I only make unimportant things now, but it’s all the unimportant things that really make up our lives.”
PressPad, an attempt to bring some class diversity to posh British journalism, is shutting down
“While there is even more need for this intervention than when we began the project, the initiative needs more resources than the current team can provide.”