Twitter  The Guardian is telling the story of World War I in seven different languages nie.mn/1z3gx6V  
Nieman Journalism Lab
Pushing to the future of journalism — A project of the Nieman Foundation at Harvard
free-files-propublica

Free the Files! ProPublica taps the crowd for a database-building sprint to election day

The site wants your help assembling a database of television ad buys in swing states.

Political transparency geeks got both good news and bad news from the Federal Communications Commission last April.

Good news first: The FCC decided it would require television stations to put information about political ad buys online.

The bad news: Only stations from the top 50 markets are required to do so, and stations can post the files as image PDFs — meaning there’s no easy way to search records by the name of the ad buyer from the FCC database.

And what good is a bunch of data if you can’t extract meaning?

“It’s a news app that our readers are essentially building in real time.”

ProPublica is coming to the rescue — it hopes, with your help — with a project launched today called Free the Files, the latest iteration of an ongoing effort to examine political ads and the shadowy groups that often pay for them.

Within the top 50 TV markets, ProPublica is focusing on ads purchased in swing states like Virginia, Florida, Nevada, and Pennsylvania.

Your mission, should you choose to accept it:

    2. Pick a document — either by television market, or by clicking the random “Give me a file!” button
    3. Answer four questions about that document: Who bought it? What agency? How much? What’s the contract number on the ad buy?

There are plenty of group names already in the system, so when you start typing “American…” for example, a list of nonprofits or super PACs like American Crossroads pops up. It’s a feature that promotes consistency in data entry in the same way that including a contract number is meant to eliminate duplications. Help buttons attached to each question guide volunteers on properly reading the files. There’s also a box to check if you notice “something else notable” about any given file.

“The beauty of this is its simplicity,” ProPublica senior engagement editor Amanda Zamora told me. “We’re asking for very specific data points. We’re not asking people to do that next step and say, ‘What kind of group is this?’ We chose to focus, to make it something that people would likely do. We want to give people incentive but also don’t want them to feel, ‘This is a Sisyphean task, we’ll never make it.’”

But the task ahead is still a major one. ProPublica has more than 15,000 files on hand with 40 short days until the presidential election, and each datapoint on a file needs to be verified by at least two users before it can be officially entered into ProPublica’s database. Developing verification infrastructure has been the biggest challenge for ProPublica’s Al Shaw, who calls the project “one of the biggest and most advanced crowdsourced efforts ever done.” ProPublica is known for its ambitious data projects, but it usually cleans up and organizes data in-house before sharing it with the world.

“The whole thing is a huge experiment, and we’re not sure if it’s going to work,” Shaw told me. “It’s a news app that our readers are essentially building in real time. One thing we’re worried about: Are we going to have an empty room, just a super structure with no data?”

ProPublica’s also in the thick of another real-time crowdsourced database project this election season. The site has been asking readers to feed its Message Machine with campaign emails — readers also provide demographic information about themselves to ProPublica — in hopes of better understanding how campaigns target different groups. (That layer of analysis comes later. The real-time component is the ability to mouse over a graph of emails, sorted by candidate and subject line.)

ProPublica isn’t measuring its success based on whether an army of volunteers can work their way through every last last file — although, of course, that’s an outcome the site would welcome. If this effort helps identify even one otherwise unknown ad buyer, that’s a journalistic victory in Zamora’s eyes.

There’s also an opportunity to pick up where Federal Election Committee filings leave off — as well as identify so-called dark money groups that may be spending money on campaigns without reporting it to the FEC. ProPublica has its reporters ready to take the work of the crowd and apply traditional, aggressive reporting techniques. (Bonus: Other news organizations can dip into the database and do their own reporting.)

The basic strategy: Crowdsource the assembly of a database but leave it to the reporters to take on more complicated and time-consuming legwork and analysis. Already ProPublica has a group of more than 500 volunteers — people who said they were willing to physically visit their local TV stations and send files to ProPublica before the FCC required the stations to do it.

For volunteers, incentives are built in all around them. There’s the overarching idea that they’re contributing to important work, but they’ll also get to see the fruits of their labor as it happens. The Free the Files map that they’re populating with data will become more robust with their efforts. There’s also a gamification aspect to the project, which features a leaderboard that ranks volunteers by how many files they’ve freed.

Plus, Zamora set up a Facebook group for the volunteers, a place where people can discuss their work, share information about ads they’ve seen in their home states, and so on. ProPublica has had success with crowdsourced projects in the past when they’ve assigned meaningful but doable tasks and created forums that reinforce the strength of the community that’s doing the work. Also, ProPublica frames the mission narrowly — that includes making clear not only the goal but explicitly telling volunteers what not to do.

“We’re saying, ‘Look, this is not a place for political rants or partisanship,’” Zamora said. “We have a mission: We’re trying to increase transparency around political spending, and we’ve got a lot of documents and a lot of work to do in a short amount of time. We’re embarking on something really different, asking out readers to help us fill in the blanks, and visualize and log the data before it’s complete.”

                                   
What to read next
Correctiv feature
Caroline O'Donovan    July 17, 2014
The new CORRECT!V is trying to find a place for foundation-supported media in Europe’s largest economy.