Newsonomics: BuzzFeed and The New York Times play Facebook’s ubiquity game
ABOUT                    SUBSCRIBE
Sept. 27, 2012, 11 a.m.
Reporting & Production

Free the Files! ProPublica taps the crowd for a database-building sprint to election day

The site wants your help assembling a database of television ad buys in swing states.

Political transparency geeks got both good news and bad news from the Federal Communications Commission last April.

Good news first: The FCC decided it would require television stations to put information about political ad buys online.

The bad news: Only stations from the top 50 markets are required to do so, and stations can post the files as image PDFs — meaning there’s no easy way to search records by the name of the ad buyer from the FCC database.

And what good is a bunch of data if you can’t extract meaning?

“It’s a news app that our readers are essentially building in real time.”

ProPublica is coming to the rescue — it hopes, with your help — with a project launched today called Free the Files, the latest iteration of an ongoing effort to examine political ads and the shadowy groups that often pay for them.

Within the top 50 TV markets, ProPublica is focusing on ads purchased in swing states like Virginia, Florida, Nevada, and Pennsylvania.

Your mission, should you choose to accept it:

    2. Pick a document — either by television market, or by clicking the random “Give me a file!” button
    3. Answer four questions about that document: Who bought it? What agency? How much? What’s the contract number on the ad buy?

There are plenty of group names already in the system, so when you start typing “American…” for example, a list of nonprofits or super PACs like American Crossroads pops up. It’s a feature that promotes consistency in data entry in the same way that including a contract number is meant to eliminate duplications. Help buttons attached to each question guide volunteers on properly reading the files. There’s also a box to check if you notice “something else notable” about any given file.

“The beauty of this is its simplicity,” ProPublica senior engagement editor Amanda Zamora told me. “We’re asking for very specific data points. We’re not asking people to do that next step and say, ‘What kind of group is this?’ We chose to focus, to make it something that people would likely do. We want to give people incentive but also don’t want them to feel, ‘This is a Sisyphean task, we’ll never make it.'”

But the task ahead is still a major one. ProPublica has more than 15,000 files on hand with 40 short days until the presidential election, and each datapoint on a file needs to be verified by at least two users before it can be officially entered into ProPublica’s database. Developing verification infrastructure has been the biggest challenge for ProPublica’s Al Shaw, who calls the project “one of the biggest and most advanced crowdsourced efforts ever done.” ProPublica is known for its ambitious data projects, but it usually cleans up and organizes data in-house before sharing it with the world.

“The whole thing is a huge experiment, and we’re not sure if it’s going to work,” Shaw told me. “It’s a news app that our readers are essentially building in real time. One thing we’re worried about: Are we going to have an empty room, just a super structure with no data?”

ProPublica’s also in the thick of another real-time crowdsourced database project this election season. The site has been asking readers to feed its Message Machine with campaign emails — readers also provide demographic information about themselves to ProPublica — in hopes of better understanding how campaigns target different groups. (That layer of analysis comes later. The real-time component is the ability to mouse over a graph of emails, sorted by candidate and subject line.)

ProPublica isn’t measuring its success based on whether an army of volunteers can work their way through every last last file — although, of course, that’s an outcome the site would welcome. If this effort helps identify even one otherwise unknown ad buyer, that’s a journalistic victory in Zamora’s eyes.

There’s also an opportunity to pick up where Federal Election Committee filings leave off — as well as identify so-called dark money groups that may be spending money on campaigns without reporting it to the FEC. ProPublica has its reporters ready to take the work of the crowd and apply traditional, aggressive reporting techniques. (Bonus: Other news organizations can dip into the database and do their own reporting.)

The basic strategy: Crowdsource the assembly of a database but leave it to the reporters to take on more complicated and time-consuming legwork and analysis. Already ProPublica has a group of more than 500 volunteers — people who said they were willing to physically visit their local TV stations and send files to ProPublica before the FCC required the stations to do it.

For volunteers, incentives are built in all around them. There’s the overarching idea that they’re contributing to important work, but they’ll also get to see the fruits of their labor as it happens. The Free the Files map that they’re populating with data will become more robust with their efforts. There’s also a gamification aspect to the project, which features a leaderboard that ranks volunteers by how many files they’ve freed.

Plus, Zamora set up a Facebook group for the volunteers, a place where people can discuss their work, share information about ads they’ve seen in their home states, and so on. ProPublica has had success with crowdsourced projects in the past when they’ve assigned meaningful but doable tasks and created forums that reinforce the strength of the community that’s doing the work. Also, ProPublica frames the mission narrowly — that includes making clear not only the goal but explicitly telling volunteers what not to do.

“We’re saying, ‘Look, this is not a place for political rants or partisanship,'” Zamora said. “We have a mission: We’re trying to increase transparency around political spending, and we’ve got a lot of documents and a lot of work to do in a short amount of time. We’re embarking on something really different, asking out readers to help us fill in the blanks, and visualize and log the data before it’s complete.”

POSTED     Sept. 27, 2012, 11 a.m.
SEE MORE ON Reporting & Production
Show comments  
Show tags
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Newsonomics: BuzzFeed and The New York Times play Facebook’s ubiquity game
The ubiquity game has different rules for digital startups than for legacy businesses. But for both, figuring out the right relationship with Facebook is key to their audience strategies.
Jeff Israely: Good content marketing benefits from a smart publisher’s touch
Our startup correspondent, building Worldcrunch in Paris, on the thinking behind its operation’s pivot: “The smart brands know they’ll lose your attention if they use this new publishing power simply to push their merchandise.”
How a hobby foreign affairs blog became a paywalled news destination — and a business
World Politics Review has grown from one man’s side project to a small news operation supported by a niche paywall.
What to read next
Millennials say keeping up with the news is important to them — but good luck getting them to pay for it
The new report from the Media Insight Project looks at millennials’ habits and attitudes toward news consumption: “I really wouldn’t pay for any type of news because as a citizen it’s my right to know the news.”
926The next stage in the battle for our attention: Our wrists
News companies have moved from print dollars to digital dimes to mobile pennies. Now, with the highly anticipated launch of the Apple Watch, the screens are getting even smaller. How are smart publishers thinking about the right way to serve users and maintain their attention on smartwatches?
729A wave of distributed content is coming — will publishers sink or swim?
Instead of just publishing to their own websites, news organizations are being asked to publish directly to platforms they don’t control. Is the hunt for readers enough to justify losing some independence?
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Next Door Media
Sports Illustrated
Daily Kos
Voice of San Diego
Texas Tribune
Seattle Post-Intelligencer
Al Jazeera
Chicago News Cooperative
Dallas Morning News