In an ideal world, when news breaks, reporters can fall back on their encyclopedic knowledge of local stories, events, and people to put the news in context. And failing that, they can turn to colleague who’ve been covering a beat forever and know where to find the right files.
The ideal world doesn’t often match up with the real one, which calls on reporters to employ a mix of intuition, document archives, and Google to shore up their knowledge when writing a lead on a breaking story. And if there’s a place where the real world and the ideal world meet, it’s in the files, the data, that all newsrooms compile but may not have accessible.
But with your own newsroom PANDA, that could all change. The PANDA Project, a winner of this year’s Knight News Challenge, is what developer Brian Boyer calls a “newsroom data application,” a tool that helps find context and relationships on the fly. Boyer, the news applications editor at the Chicago Tribune, will lead the project, which plans to create a set of web-based open source tools that will allow any newsroom to set up their own PANDA to analyze data whenever the need arises. (As for that name? “PANDA A News Data Application” is a cheeky, but hopefully not too cute, recursive acronym, Boyer hopes.)
The PANDA project’s one-year, $150,000 grant will largely go towards hiring a developer to build the application, along with some assorted contracting work necessary to give the project a nice look and easy-to-understand features. Boyer is working in concert with Investigative Reporters & Editors (where the developer will likely be working, along with an expanded group of data producers), and The Spokane Spokesman-Review, whose online director, Ryan Pitts, is another lead on the project.
As for building the tool, “the first problem is knowledge management,” Boyer told me. “Where to stash all that information you collect. It’s a problem that all businesses have.”
News organizations, almost by their nature, have tons of data, from Census numbers and campaign finance reports to DWI records and housing prices. It’s all information that have proved their usefulness at one point or another. But instead of allowing it to be dumped in a file cabinet or left to die on a forgotten disk drive, PANDA wants to give all that info a home where it can be easily accessed.
The idea for the project stemmed from an update to a similar database at the Tribune, which allows reporters to cross-check names against city records and other information. “We dig up all kinds of datasets,” Boyer remembers thinking at the time; “we could really augment this thing.” But he and his team started thinking more broadly. They figured that, if having a cross-referencing database could work for the Tribune, it could work well for others — particularly smaller newspapers that may not have the resources the Trib enjoys.
One of the cornerstones of the project is Google Refine, a tool launched last year that cleans up datasets filled with irregularities and inconsistencies. One of the added benefits of Google Refine, Boyer said, is that it can help draw relationships across data. “So when a reporter gets a 10,000-row campaign contribution list, they can reconcile it against databases we keep on file to see what things pop up,” Boyer said.
The initial focus will be on surveying reporters on how they would like a database like PANDA to work in their newsroom. The next step will be trying to find ways to scale the project across newsrooms of varying sizes. One big question to address is how to host the large amounts of data that PANDA will involve. Boyer said a cloud storage option would likely work best, but they don’t have specifics worked out for that part of the project yet.
One thing Boyer already has worked out is that PANDA needs to be accessible in order to succeed. Newsrooms will have to be able to set it up seamlessly and start putting it to use without too much instruction or installation hassles. “The goal is to have a system that each news organization can put to their own use,” Boyer said. “You don’t have to have a server administrator set it up for you. I want this to be something an editor can set up for you, not your IT department.”
Panda image used under a Creative Commons license from Jenn and Tony Bot.