HOME
          
LATEST STORY
Ken Doctor: Why The New York Times hired Kinsey Wilson
ABOUT                    SUBSCRIBE
Sept. 27, 2012, 11 a.m.
Reporting & Production
free-files-propublica

Free the Files! ProPublica taps the crowd for a database-building sprint to election day

The site wants your help assembling a database of television ad buys in swing states.

Political transparency geeks got both good news and bad news from the Federal Communications Commission last April.

Good news first: The FCC decided it would require television stations to put information about political ad buys online.

The bad news: Only stations from the top 50 markets are required to do so, and stations can post the files as image PDFs — meaning there’s no easy way to search records by the name of the ad buyer from the FCC database.

And what good is a bunch of data if you can’t extract meaning?

“It’s a news app that our readers are essentially building in real time.”

ProPublica is coming to the rescue — it hopes, with your help — with a project launched today called Free the Files, the latest iteration of an ongoing effort to examine political ads and the shadowy groups that often pay for them.

Within the top 50 TV markets, ProPublica is focusing on ads purchased in swing states like Virginia, Florida, Nevada, and Pennsylvania.

Your mission, should you choose to accept it:

    2. Pick a document — either by television market, or by clicking the random “Give me a file!” button
    3. Answer four questions about that document: Who bought it? What agency? How much? What’s the contract number on the ad buy?

There are plenty of group names already in the system, so when you start typing “American…” for example, a list of nonprofits or super PACs like American Crossroads pops up. It’s a feature that promotes consistency in data entry in the same way that including a contract number is meant to eliminate duplications. Help buttons attached to each question guide volunteers on properly reading the files. There’s also a box to check if you notice “something else notable” about any given file.

“The beauty of this is its simplicity,” ProPublica senior engagement editor Amanda Zamora told me. “We’re asking for very specific data points. We’re not asking people to do that next step and say, ‘What kind of group is this?’ We chose to focus, to make it something that people would likely do. We want to give people incentive but also don’t want them to feel, ‘This is a Sisyphean task, we’ll never make it.'”

But the task ahead is still a major one. ProPublica has more than 15,000 files on hand with 40 short days until the presidential election, and each datapoint on a file needs to be verified by at least two users before it can be officially entered into ProPublica’s database. Developing verification infrastructure has been the biggest challenge for ProPublica’s Al Shaw, who calls the project “one of the biggest and most advanced crowdsourced efforts ever done.” ProPublica is known for its ambitious data projects, but it usually cleans up and organizes data in-house before sharing it with the world.

“The whole thing is a huge experiment, and we’re not sure if it’s going to work,” Shaw told me. “It’s a news app that our readers are essentially building in real time. One thing we’re worried about: Are we going to have an empty room, just a super structure with no data?”

ProPublica’s also in the thick of another real-time crowdsourced database project this election season. The site has been asking readers to feed its Message Machine with campaign emails — readers also provide demographic information about themselves to ProPublica — in hopes of better understanding how campaigns target different groups. (That layer of analysis comes later. The real-time component is the ability to mouse over a graph of emails, sorted by candidate and subject line.)

ProPublica isn’t measuring its success based on whether an army of volunteers can work their way through every last last file — although, of course, that’s an outcome the site would welcome. If this effort helps identify even one otherwise unknown ad buyer, that’s a journalistic victory in Zamora’s eyes.

There’s also an opportunity to pick up where Federal Election Committee filings leave off — as well as identify so-called dark money groups that may be spending money on campaigns without reporting it to the FEC. ProPublica has its reporters ready to take the work of the crowd and apply traditional, aggressive reporting techniques. (Bonus: Other news organizations can dip into the database and do their own reporting.)

The basic strategy: Crowdsource the assembly of a database but leave it to the reporters to take on more complicated and time-consuming legwork and analysis. Already ProPublica has a group of more than 500 volunteers — people who said they were willing to physically visit their local TV stations and send files to ProPublica before the FCC required the stations to do it.

For volunteers, incentives are built in all around them. There’s the overarching idea that they’re contributing to important work, but they’ll also get to see the fruits of their labor as it happens. The Free the Files map that they’re populating with data will become more robust with their efforts. There’s also a gamification aspect to the project, which features a leaderboard that ranks volunteers by how many files they’ve freed.

Plus, Zamora set up a Facebook group for the volunteers, a place where people can discuss their work, share information about ads they’ve seen in their home states, and so on. ProPublica has had success with crowdsourced projects in the past when they’ve assigned meaningful but doable tasks and created forums that reinforce the strength of the community that’s doing the work. Also, ProPublica frames the mission narrowly — that includes making clear not only the goal but explicitly telling volunteers what not to do.

“We’re saying, ‘Look, this is not a place for political rants or partisanship,'” Zamora said. “We have a mission: We’re trying to increase transparency around political spending, and we’ve got a lot of documents and a lot of work to do in a short amount of time. We’re embarking on something really different, asking out readers to help us fill in the blanks, and visualize and log the data before it’s complete.”

POSTED     Sept. 27, 2012, 11 a.m.
SEE MORE ON Reporting & Production
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Ken Doctor: Why The New York Times hired Kinsey Wilson
The former chief content officer at NPR will be moving up I-95 to one of the most important digital positions at the Times.
Why Google is taking another shot at helping readers pay for news
Google Contributor is the latest tool the company has designed to help readers pay for what they read online. But its previous experiments in supporting paid content have had limited success.
In Canada, newspapers’ attempts to experiment with ebooks haven’t seen much success
A number of papers across the country started ebook programs in the early part of this decade, repurposing their archives or producing new work. They haven’t been the moneymakers some had hoped.
What to read next
718
tweets
Ken Doctor: The New York Times’ financials show the transition to digital accelerating
The numbers may look flat, but they contain a continuing set of ups and downs. Up next: executing on a year’s worth of launches.
540Here’s some remarkable new data on the power of chat apps like WhatsApp for sharing news stories
At least in certain contexts, WhatsApp is a truly major traffic driver — bigger even than Facebook. Should there be a WhatsApp button on your news site?
502Controlled chaos: As journalism and documentary film converge in digital, what lessons can they share?
Old and new media types from journalism, documentary, and technology backgrounds gathered at MIT to share practices and discuss mutual concerns.
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
NewsTilt
Newsweek
Texas Tribune
BuzzFeed
The Daily Telegraph
The Atlantic
Daily Kos
Public Radio International
InvestigateWest
The Washington Post
Neighborlogs
St. Louis Globe-Democrat