Nieman Foundation at Harvard
How The Washington Post built — and will be building on — its “Knowledge Map” feature
ABOUT                    SUBSCRIBE
July 1, 2014, 12:07 p.m.
Reporting & Production
Internet Archive

In Philadelphia, the Internet Archive is assembling a new way to monitor campaigns on TV

Every television ad, news segment, political blog, and campaign website in the Philadelphia market will be searchable prior to the November 4 election

Anyone who’s spent time in politics knows the power of television to push messages and shape minds. But measuring its impact can require access to information that can be hard to find outside certain political circles.

The Internet Archive has launched a new project in Philadelphia that tries to address that problem with its institutional strength — gathering and archiving lots of stuff. The Archive is recording every minute of television news in Philly, as well as political ads aired on major broadcast stations. A mere 24 hours after broadcast, it will be possible to rewatch TV content online. In addition, the Archive will crawl content from across the web — news blogs, campaign websites and more — for their Philadelphia digital media landscape collection.

internet archive

This effort comes in advance of several contested congressional elections in the Philadelphia region this November. Roger Macdonald, director of the Television Archive at the Internet Archive, selected the market for the first geographically based archiving project for this reason. The goal: to provide data for journalists and researchers interested in tracking the media landscape and understanding how political messages — and dollars — move through the system. Using text from closed captioning as well as metadata organized by volunteer viewers, the Philadelphia archive will be searchable by region, station, and date, as well as by campaign issue or ad sponsor.

In the past, Internet Archive data has been used a variety of research purposes, including measuring how people use gestures, mapping placename mentions in the mainstream media, analyzing sentiment, and tracking word use. Reporters at FiveThirtyEight have use the archive to shore up their reporting.

“At its heart, it’s a library,” says Macdonald. “As a library, it’s an open invitation to come and utilize our resources, collaborate with us to build up these resources for your own institutional benefit, and to elaborate on the information in the library. We’ll try to help people utilize and interact with the data. But we don’t create product. We won’t be saying: This is what you should do with this.”

So what will be done with it? Macdonald cited a paper called “Mapping the Trayvon Martin Media Controversy” by researchers at the MIT Center for Civic Media (who used Media Cloud) as an example of the kind of research that could be done using the new tool. And indeed, some researchers are excitedly awaiting the opportunity to take a look at the Philadelphia data.

Danilo Yanich is an associate professor at the University of Delaware interested in how political ad buys influence and inform news coverage in local television (and, ultimately, policy). In his past work, he and his students have watched and coded over 30,000 hours of local television. His most extensive work thus far has been in Honolulu, where Yanich and his team recorded 100 news stories and 600 political ads during the 2012 general election. Not one of those news stories, Yanich says, addressed issues and claims made in the political advertisements. Now, with access to the Internet Archive’s data out of Philadelphia, the amount of information Yanich and his team have access to has doubled.

“The questions are: What are the issues presented in political ads? Are those issues covered in local political news stories? And if they are covered, are they addressed in a critical fashion in which there is an evaluation of that claims that are made?” says Yanich.

But because he’s an academic, Yanich’s findings won’t be published until long after election day in Pennsylvania. “One of the great challenges has always been that people look in retrospect and get great insight, but the voters miss the benefit of journalism, to help them make more informed choices,” says Macdonald. “I met with several reporters from the Inquirer a month and a half ago who expressed an enormous amount of interest. I learned from them that they thought it would be of great value in the campaign context.”

Also interested in helping the data reach newsrooms is the Sunlight Foundation, specifically Kathy Kiel
, via the Political Ad Sleuth project. Ad Sleuth started as a crowdsourced operation that organized volunteers to visit TV stations where they would copy files that show what special interest groups are buying political ads. “Under federal law, these groups are required to file a form that indicates who their top executives are, or who their board of directors are, which is all a good political reporter needs to start figuring out who’s behind these groups,” Kiely says.

Starting today, all broadcast stations are required to file that information digitally, meaning Ad Sleuth will have a lot more information in its database. “You can enter the name of a committee in Political Ad Sleuth, and it will tell you every single place that a committee has bought ads. You can sort by state, you can sort by TV market, you can sort by date. It really helps reporters provide context, understand who’s advertising in the market,” says Kiely.

But the one thing the Ad Sleuth files don’t show is what’s actually in an ad — you know who bought it, but not what it says. By combining that data with the Philadelphia recordings, however, it will be possible to see all of that information in one place. “You’ll be able to take this soft little ad about puppy dogs and snails and kitty cat tails and connect it to the people who want to do fracking,” says Kiely. “That is the beauty of this.”

For now, though, there’s no direct digital connection between the two, and Kiely says she hopes reporters will “act as a crosswalk” between the Internet Archive and Ad Sleuth. “There are a million stories in the database that people who know things I don’t know will be able to find,” she says. “We want reporters to know about this tool and to use it.”

Non-journalist volunteers will also be needed to make this project come together. Important metadata like the political ad buy files exist as PDFs, which require a person to manually turn into searchable data. Volunteers are also needed to watch and manually tag the broadcast data, separating news segments from ads. Macdonald hopes these volunteers will help train an algorithm that can do this work automatically — ideally, such a program would also be able to differentiate between news story topics — but he says it’s unlikely it would be operational before the end of the year.

If all goes well in Philadelphia, the next step for the Internet Archive is to record and crawl media markets across the country for the 2016 election. “We want to move not just to big media markets, but some of the smaller markets — those where there’s a lot of ethnic and cultural diversity,” Macdonald says. “Those communities are overlooked in many ways, and we think that bringing our library of resources to bear on their news may help bring some of their issues to the attention of the rest of the nation.”

The Internet Archive’s project in Philadelphia will continue to expand and incorporate more community partners as the election nears — the organization recently received a $15,000 grant from the Philadelphia Foundation. It remains to be seen what kind of stories will emerge from the data gathered in Philadelphia, but the Internet Archive’s potential for social impact can only grow as their stores of information expand.

Photo via Scott Beale/Laughing Squid used under a Creative Commons license.

POSTED     July 1, 2014, 12:07 p.m.
SEE MORE ON Reporting & Production
Show comments  
Show tags
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
How The Washington Post built — and will be building on — its “Knowledge Map” feature
The Post is looking to create a database of “supplements” — categorized pieces of text and graphics that help give context around complicated news topics — and add it as a contextual layer across lots of different Post stories.
How 7 news organizations are using Slack to work better and differently
Here’s how Fusion, Vox, Quartz, Slate, the AP, The Times of London, and Thought Catalog are using Slack for workflow — and which features they wish the platform would add.
The New York Times built a robot to help make article tagging easier
Developed by the Times R&D lab, the Editor tool scans text to suggest article tags in real time. But the automatic tagging system won’t be moving into the newsroom soon.
What to read next
New Pew data: More Americans are getting news on Facebook and Twitter
A new study from the Pew Research Center and Knight Foundation finds that more Americans of all ages, races, genders, education levels, and incomes are using Twitter and Facebook to consume news.
701Newsonomics: The halving of America’s daily newsrooms
If you’re lucky enough to have the right deep-pocketed owner buy your paper and steady it, you’ve won the lottery. If you’re in a town whose paper is owned by the better chains, or committed local ownership, your loss will probably be mitigated. Otherwise, you’re out of luck.
575How 7 news organizations are using Slack to work better and differently
Here’s how Fusion, Vox, Quartz, Slate, the AP, The Times of London, and Thought Catalog are using Slack for workflow — and which features they wish the platform would add.
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Conde Nast
The Daily Show
Daily Mail
The Daily Voice
Alaska Dispatch
Ann Arbor News
Chicago News Cooperative
Poynter Institute
BBC News