Nieman Foundation at Harvard
America’s Test Kitchen, “the Consumer Reports of cooking,” wants to grow to new platforms
ABOUT                    SUBSCRIBE
March 20, 2013, 10:13 a.m.

Tracking memes across television news: A tool for analyzing how stories move through broadcast

Want to see when a story first reached national television news, or which network programs kept it in the limelight? This Ruby script can help.

Too long, didn’t read: You can use this Ruby script to query’s recently-launched TVNews archive and download JSON files with the results. It’s great for tracking how frequently a person or topic shows up in U.S. televised news broadcasts.

One of the goals of our research at the MIT Center for Civic Media is to better quantify media attention. We want to know which stories, people, and events our society is paying attention to, which we are missing, and the role our media plays in determining what we see. We’re working with the fine people at Harvard’s Berkman Center to build out Media Cloud. In addition to building tools, we also investigate case studies of news stories that offer greater insight into how the news plays out. You may have seen Yochai Benkler’s investigation into SOPA and PIPA and a networked movement’s success driving the media narrative:

Last year, I wrote about the Trayvon Martin story’s ascent from local blurb to national media trend. This initial analysis relied on a mixture of sources: interviews with key actors, petition data from, audience reach metrics from various news sources, Google Trends, visualizations of physical front pages of newspapers, and the Pew Project for Excellence in Journalism’s News Coverage Index.

After the post attracted some interest, we investigated further with additional datasets. With the help of my colleague Erhardt Graeff, we added Media Cloud to the mix. Media Cloud allows us to see who’s talking about the story on blogs and webpages, which voices are dominating the discussion, and which words and frames they’re introducing to the narrative. (More on our findings to follow). I also added Twitter firehose data to the mix, thanks to General Sentiment.

Even though we’re excited about the potential of participatory media to help shape what we pay attention to, our media research consistently finds that broadcast media — primarily TV — still plays a critical role in the amplification of voices. This was especially true for the Trayvon Martin story, which moved relatively quickly from obscurity to wall-to-wall cable news coverage.

At the same time, the team at, that quintessential Internet resource, was busy launching their TV News archive. The search tool queries more than 410,000 broadcasts, dating from June 2009 up to 24 hours ago. (They are working to extend the archive further back into history.) The search results deliver video clips ready to play the section of video containing your query. It’s a fun interface for exploration, but if you’re looking at how a story trends over several months, you need something more systematic, and may want to use our script.

The team is working to build out the platform and improve the user experience. They don’t currently have capacity to guide and support researchers at the moment, but they do want to get this data in the hands of the curious as soon as possible. To that end, they’ve given me permission to share this quick Ruby script I wrote with the help of my colleague Rahul Bhargava.

How to use the script

  1. Download the file from GitHub.
  2. Open it in a text editor (like TextWrangler or BBEdit), edit line 11 of the code to change 'Your Query' to your preferred search term(s), and save it
  3. Go to the command line (Terminal on a Mac, DOS or Cygwin in Windows)
  4. Navigate to the folder that contains the script
  5. Type in ruby and hit enter

Your results will show up in the same directory as the script itself. The results returned will be in JSON, the open data format. You can adjust how many results to return at once (by changing the ROWS variable in the script), but go easy on’s servers: You’ll get your results faster (nearly instantly) in smaller batches of 200 or so.

Once you have your data, you can combine, clean, and parse it with Google Refine. I found ProPublica’s guide to cleaning messy data really helpful. You may also want to de-duplicate, because records TV news broadcasts on the both the east and west coasts.

What you can do with it

Analyze a story: You could search for a specific story, like the recent controversial Steubenville rape case, and quickly get a sense of which news companies are covering the case and which words they use to talk about it. You can also share links to specific clips with your friends and colleagues.

You could also investigate our professional media’s treatment of a broader topic. You could trace the spread of the phrase “Obamacare” or watch the many breathless news segments covering “technology.”

steubenville-line-graphVisualize TV news data: You’ll also have the data you need to visualize the lifespan of a story on televised news broadcasts. renders a small line graph in your search results, but the JSON data will allow you to do much more.

For example, in the Trayvon Martin case study, we ended up normalizing the data with the number of Trayvon mentions in the printed press, blogosphere, on Twitter, and across other channels to determine when interest began and peaked. As you can see with the green bars below, TV news was an important channel in the early stages of the Trayvon Martin story.


This data source helped us determine that TV news led the press and other media in making (and keeping) Trayvon Martin national news.

Do an advanced search: The advanced search settings allow you to restrict your search by program, station, date, topic, and clip length. If you customize your search in the web interface, you’ll see which parameters get added to the search results URL. You can then copy and paste those into your Ruby script to add the same filters to your bulk data download.


Compare station-by-station coverage: You could also look at how a story or topic between the East and West coasts of the United States.’s news database contains recordings from Washington, D.C., San Francisco, and national programs. Here’s a list of the station call letters and their locations.

Borrow DVDs of programs: If you want more than the short clip containing your search query, you can borrow a DVD recording of the full broadcast from the Internet Archive. To do so, you can either show up in person at’s San Francisco library or pay a (sometimes refundable) $25-75 fee to have it mailed to you.

Matt Stempeck is a graduating master’s student at the MIT Media Lab’s Center for Civic Media.

POSTED     March 20, 2013, 10:13 a.m.
Show comments  
Show tags
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
America’s Test Kitchen, “the Consumer Reports of cooking,” wants to grow to new platforms
“We’d like to move to other platforms, particularly as we see the changes in how people consume television.”
A program from Poynter and ONA is helping foster a community of female leaders in digital media
The Women’s Leadership Academy provides camaraderie and concrete advice beyond a bundle of platitudes.
Come talk ad blockers with Nieman Lab and a set of experts in New York
We’re having our first event in New York City with industry leaders: Wednesday, December 2 at 6 p.m.
What to read next
How one blog helped spark The New York Times’ digital evolution
“I certainly had editors tell me that I shouldn’t be wasting my time on Bird Week. But that was the best part of City Room…We were like unsupervised children.”
572News outlets left and right (and up, down, and center) are embracing virtual reality technology
Among those experimenting is The Wall Street Journal, which plans to open source its 360-degree mobile video and VR technology and hopes to turn VR into more of a mainstay of its storytelling.
502Podcasting in 2015 feels a lot like blogging circa 2004: exciting, evolving, and trouble for incumbents
The same trends we saw a decade ago — professionalization on one hand, platformization on the other — sure seem to be playing out again.
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Fuego is our heat-seeking Twitter bot, tracking the links the future-of-journalism crowd is talking about most on Twitter.
Here are a few of the top links Fuego’s currently watching.   Get the full Fuego ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
The Washington Post
Kaiser Health News
Animal Político
Franklin Center
El País
Center for Investigative Reporting
Demand Media
Corporation for Public Broadcasting