Constantly tweaking: How The Guardian continues to develop its in-house analytics system
ABOUT                    SUBSCRIBE
Sept. 27, 2012, 11 a.m.
Reporting & Production

Free the Files! ProPublica taps the crowd for a database-building sprint to election day

The site wants your help assembling a database of television ad buys in swing states.

Political transparency geeks got both good news and bad news from the Federal Communications Commission last April.

Good news first: The FCC decided it would require television stations to put information about political ad buys online.

The bad news: Only stations from the top 50 markets are required to do so, and stations can post the files as image PDFs — meaning there’s no easy way to search records by the name of the ad buyer from the FCC database.

And what good is a bunch of data if you can’t extract meaning?

“It’s a news app that our readers are essentially building in real time.”

ProPublica is coming to the rescue — it hopes, with your help — with a project launched today called Free the Files, the latest iteration of an ongoing effort to examine political ads and the shadowy groups that often pay for them.

Within the top 50 TV markets, ProPublica is focusing on ads purchased in swing states like Virginia, Florida, Nevada, and Pennsylvania.

Your mission, should you choose to accept it:

    2. Pick a document — either by television market, or by clicking the random “Give me a file!” button
    3. Answer four questions about that document: Who bought it? What agency? How much? What’s the contract number on the ad buy?

There are plenty of group names already in the system, so when you start typing “American…” for example, a list of nonprofits or super PACs like American Crossroads pops up. It’s a feature that promotes consistency in data entry in the same way that including a contract number is meant to eliminate duplications. Help buttons attached to each question guide volunteers on properly reading the files. There’s also a box to check if you notice “something else notable” about any given file.

“The beauty of this is its simplicity,” ProPublica senior engagement editor Amanda Zamora told me. “We’re asking for very specific data points. We’re not asking people to do that next step and say, ‘What kind of group is this?’ We chose to focus, to make it something that people would likely do. We want to give people incentive but also don’t want them to feel, ‘This is a Sisyphean task, we’ll never make it.'”

But the task ahead is still a major one. ProPublica has more than 15,000 files on hand with 40 short days until the presidential election, and each datapoint on a file needs to be verified by at least two users before it can be officially entered into ProPublica’s database. Developing verification infrastructure has been the biggest challenge for ProPublica’s Al Shaw, who calls the project “one of the biggest and most advanced crowdsourced efforts ever done.” ProPublica is known for its ambitious data projects, but it usually cleans up and organizes data in-house before sharing it with the world.

“The whole thing is a huge experiment, and we’re not sure if it’s going to work,” Shaw told me. “It’s a news app that our readers are essentially building in real time. One thing we’re worried about: Are we going to have an empty room, just a super structure with no data?”

ProPublica’s also in the thick of another real-time crowdsourced database project this election season. The site has been asking readers to feed its Message Machine with campaign emails — readers also provide demographic information about themselves to ProPublica — in hopes of better understanding how campaigns target different groups. (That layer of analysis comes later. The real-time component is the ability to mouse over a graph of emails, sorted by candidate and subject line.)

ProPublica isn’t measuring its success based on whether an army of volunteers can work their way through every last last file — although, of course, that’s an outcome the site would welcome. If this effort helps identify even one otherwise unknown ad buyer, that’s a journalistic victory in Zamora’s eyes.

There’s also an opportunity to pick up where Federal Election Committee filings leave off — as well as identify so-called dark money groups that may be spending money on campaigns without reporting it to the FEC. ProPublica has its reporters ready to take the work of the crowd and apply traditional, aggressive reporting techniques. (Bonus: Other news organizations can dip into the database and do their own reporting.)

The basic strategy: Crowdsource the assembly of a database but leave it to the reporters to take on more complicated and time-consuming legwork and analysis. Already ProPublica has a group of more than 500 volunteers — people who said they were willing to physically visit their local TV stations and send files to ProPublica before the FCC required the stations to do it.

For volunteers, incentives are built in all around them. There’s the overarching idea that they’re contributing to important work, but they’ll also get to see the fruits of their labor as it happens. The Free the Files map that they’re populating with data will become more robust with their efforts. There’s also a gamification aspect to the project, which features a leaderboard that ranks volunteers by how many files they’ve freed.

Plus, Zamora set up a Facebook group for the volunteers, a place where people can discuss their work, share information about ads they’ve seen in their home states, and so on. ProPublica has had success with crowdsourced projects in the past when they’ve assigned meaningful but doable tasks and created forums that reinforce the strength of the community that’s doing the work. Also, ProPublica frames the mission narrowly — that includes making clear not only the goal but explicitly telling volunteers what not to do.

“We’re saying, ‘Look, this is not a place for political rants or partisanship,'” Zamora said. “We have a mission: We’re trying to increase transparency around political spending, and we’ve got a lot of documents and a lot of work to do in a short amount of time. We’re embarking on something really different, asking out readers to help us fill in the blanks, and visualize and log the data before it’s complete.”

POSTED     Sept. 27, 2012, 11 a.m.
SEE MORE ON Reporting & Production
Show comments  
Show tags
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Constantly tweaking: How The Guardian continues to develop its in-house analytics system
Since its launch in 2011, The Guardian has consistently made changes to its in-house analytics tool, Ophan.
Bloomberg Business’ new look has made a splash — but don’t just call it a redesign
Bloomberg digital editor Joshua Topolsky on uncomfortable news design, new ad units, and why they killed the comments.
Newsonomics: From national, Politico expands into global — and local
Having a built a business model around targeting influentials, Politico is testing how many ways it can replicate it. Why aren’t other news companies learning its lessons?
What to read next
Don’t try too hard to please Twitter — and other lessons from The New York Times’ social media desk
The team that runs the Times’ Twitter accounts looked back on what they learned — what worked, what didn’t — from running @NYTimes in 2014.
728From explainers to sounds that make you go “Whoa!”: The 4 types of audio that people share
How can public radio make audio that breaks big on social media? A NPR experiment identified what makes a piece of audio go viral.
722Q&A: Amy O’Leary on eight years of navigating digital culture change at The New York Times
“In 2007, as digital people, we were expected to be 100 percent deferent to all traditional processes. We weren’t to bother reporters or encourage them to operate differently at all, because what they were doing was the very core of our journalism.”
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Fuego is our heat-seeking Twitter bot, tracking the links the future-of-journalism crowd is talking about most on Twitter.
Here are a few of the top links Fuego’s currently watching.   Get the full Fuego ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
BBC News
West Seattle Blog
The Awl
Seattle PostGlobe
Poynter Institute
Seattle Post-Intelligencer
St. Louis Globe-Democrat