Nieman Foundation at Harvard
Why “Sorry, I don’t know” is sometimes the best answer: The Washington Post’s technology chief on its first AI chatbot
ABOUT                    SUBSCRIBE
March 9, 2016, 9:50 a.m.
Aggregation & Discovery

The Political TV Ad Archive is making it easier for journalists to report on campaign spots

The database run by the Internet Archive is collecting ads in 20 markets across eight states.

As the campaign in New Hampshire intensified leading up to the state’s presidential primary, New Hampshire Public Radio reporter Brian Wallstin wanted to understand more about the glut of political ads that were blanketing the state’s TV stations.

By law, TV and radio stations are required to disclose who is buying political ads, but these disclosures weren’t enough for Wallstin: They don’t tell you details such as the content of the ads or how many times each spot has run.

So he turned to the Political TV Ad Archive, a project from the Internet Archive that is building a searchable archive of political ads throughout the 2016 primary season in partnership with a number of fact-checking, transparency, and journalism advocacy groups.


Wallstin used the archive to build a database of New Hampshire ads, and after the primary on February 9, he wrote a definitive wrap-up of advertising in the race.

“This seemed like a good way to follow up on and build on the knowledge I was trying to obtain from the public filings with the FCC,” Wallstin told me. “It was particularly helpful for me to track the evolution of the messaging, from more positive or biographical advertising to more direct attacks on other candidates.”

The archive, which is free to use and open-source, has been used by outlets of all sizes — from The Economist and FiveThirtyEight to New Hampshire Public Radio.

It launched in January and covers 20 media markets in eight states. It also includes ads that exclusively aired online, and covers not just the presidential race but other races as well.

The archive is also partnering with the American Press Institute, Center for Public Integrity, Center for Responsive Politics, Duke Reporters’ Lab,, PolitiFact, and The Washington Post’s Fact Checker to add more context to the database, fact-check the ads, and provide training.

The Political TV Ad Archive grew out of a project the Internet Archive ran in 2014 to document and catalog political coverage and ads in Philadelphia leading up to that year’s midterm elections.

“In Philadelphia, they were using people to identify the ads as they aired on TV,” said Nancy Watzman, managing editor of the Internet Archive’s Television Archive. “We’re still doing some of that, but we’re trying to use more sophisticated tech [too].”

The Political TV Ad Archive scrapes the broadcasts and uses an audio fingerprinting tool, called The Duplitron, to identify the ads and count how many times they’ve been played. Internet Archive senior engineer Dan Schultz — familiar to longtime Nieman Lab readers for previous projects like Truth Goggles and Opened Captions — built the tool off of another open-source effort developed at Columbia University.


The ads are then uploaded to the database, where they can be downloaded, shared, embedded, or edited. The archive lists information about the ad, including its sponsor, topics covered, candidates mentioned, and the shows it aired during.

“We can’t promise that we’re catching every single instance of every single ad, but we’re doing our best and we’re improving our process as we go along,” Watzman said.


Last month the archive also began saving copies of everything that is posted to candidates’ social media accounts.

The Internet Archive isn’t the only group collecting and analyzing political ads (the archive even shares some other projects on its site), but others focus solely on historic ads or mainly use government filings, which don’t provide a particularly detailed picture of an ad’s content. Other efforts that offer similar levels of detail are paid products that may be too expensive for newsrooms.

The archive is primarily supported through a $200,000 grant from the Knight Foundation via the Knight News Challenge. (Disclosure: Knight also funds Nieman Lab, though not through the News Challenge.)

In addition to its support from Knight, the archive has also received $50,000 in funding from the Democracy Fund to work with the American Press Institute to run training sessions for journalists on how to use the archive.

For example, API went to Miami and put on a training for Univision reporters. It also ran a session for students and journalists at Hampton University, a historically black university in Hampton, Virginia. Watzman also presented the archive at a gathering of all the local iterations of PolitiFact last month.

For now, the archive only has the resources to continue through the primaries, though Watzman said it’s looking for funding to continue running the archive through this fall’s general election.

Outlets of all types have used the archive’s content in their coverage. The Atlantic’s Andrew McGill used it to create an arcade-style video game that used the archive’s data from Iowa to show how unavoidable political ads were in the days leading up to the caucuses. Users move a remote-wielding TV watcher on a couch across TV listings, trying to avoid the oncoming ads.


Vox reporter Alvin Chang used the archive to watch more than 100 different ads that aired in Iowa. (“I watched them more than once. I am still alive.”) Fusion analyzed which national TV shows the candidates most liked to advertise on. (Donald Trump likes Jimmy Fallon. Hillary Clinton? The Ellen Show.) The Washington Post used artificial intelligence to study the content of the ads. (“The Vision API also estimated that about 0.4 percent of faces showed anger or sorrow.”)

“What’s wonderful about it is that there are so many things you can do with it,” Watzman said.

The archive’s partnerships with various factchecking groups allow it to add extra contextual information. Whenever one of the archive’s partner organizations runs a fact check on an ad, that information is entered in the ad’s page in the database.


The archive has also made factcheckers’ jobs easier, said Aaron Sharockman, PolitiFact’s executive director. Instead of spending hours trying to find the ads, PolitiFact can use the archive as a starting point and can focus on actually researching the ads. The archive, Sharockman estimated, has saved two to three hours per fact check.

“That adds up to an extra fact check or two per week,” he said.

Photo of a 2008 John McCain ad by katherine of chicago used under a Creative Commons license.

POSTED     March 9, 2016, 9:50 a.m.
SEE MORE ON Aggregation & Discovery
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
Why “Sorry, I don’t know” is sometimes the best answer: The Washington Post’s technology chief on its first AI chatbot
“For Google, that might be failure mode…but for us, that is success,” says the Post’s Vineet Khosla
Browser cookies, as unkillable as cockroaches, won’t be leaving Google Chrome after all
Google — which planned to block third-party cookies in 2022, then 2023, then 2024, then 2025 — now says it won’t block them after all. A big win for adtech, but what about publishers?
Would you pay to be able to quit TikTok and Instagram? You’d be surprised how many would
“The relationship he has uncovered is more like the co-dependence seen in a destructive relationship, or the way we relate to addictive products such as tobacco that we know are doing us harm.”