HOME
          
LATEST STORY
Why The New York Times built a tool for crowdsourced time travel
ABOUT                    SUBSCRIBE
May 4, 2009, 8 a.m.

NPRbackstory: Finding value in news archives through automation

I watched the Kentucky Derby on Saturday — my allotted two minutes of horse racing a year — and got to see jockey Calvin Borel pilot a 50-1 longshot named Mine That Bird to a stirring come-from-behind victory. (My interest in horse racing is pretty much limited to rooting for jockeys like Borel, Robby Albarado, and Kent Desormeaux who are, like me, south Louisiana Cajuns. Horses, like crawfish, my people do well.)

Like a lot of people, I wrote about the Derby on Twitter. (Note this pre-race prediction that, had I’d acted on it, would have earned me $103.20 for three $2 bets.) After Borel’s victory, his name became one of the most popular search terms on Twitter.

But in the sea of people writing about Borel, there was one tweet that stood out:

It was from an automated Twitter account called NPRbackstory, a project by a man named Keith Hopper. Keith is a project manager at Public Interactive, a division of NPR, and NPRbackstory is an intriguing experiment in getting value out of one of the most overlooked assets any established news organization has: its archives.

The link in the Borel tweet is to a brief piece NPR’s Noah Adams did on the jockey two years ago, after he’d won his first Kentucky Derby. The genius of NPR Backstory is that it took no human intervention to create that tweet: the code behind it automatically detected that lots of people were suddenly searching for information about Calvin Borel, searched NPR’s archives for any Borel-related stories, found one, and posted a link to Twitter.

(I talked to Keith about this project way back last fall, before the Lab launched, and seeing that Borel tweet reminded me that I’d never written up our interview.)

SEEKING VALUE IN THE ARCHIVES

Keith told me back then that the original impetus behind NPRbackstory was the launch last July of NPR’s API, the programming interface that gives coders access to stories and other content the network has produced since 1995.

“We thought: What cool could we do with that? How might I quickly deploy something that uses the API?”

He ran into a question faced by any programmer who deals with news APIs. News organizations are really good at creating news stories. But those stories go stale very quickly. For a news-hungry world, what’s the use of having access to a bunch of radio stories from 1995?

“The NPR content is more rich in its breadth than it is in timeliness,” Keith said. “That’s probably true of most news archives. But the Internet places a high value on timeliness, and I was looking at the API saying, ‘There’s nothing timely here!'”

So he hit on the idea of providing the backstory to subjects currently in the news. “I think there’s this yearning for meaning in our content,” he said. “We want a lot of the same information, but packaged differently. I thought something that looked at the context or the background for something would be something I’d welcome seeing in my Twitter feed.”

NPRbackstory uses Google’s Hot Trends data to determine what topics people have suddenly started searching for in large numbers. It uses NPR’s API to search the archives, then uses Yahoo Pipes to create an RSS feed that then gets cycled into the NPRbackstory Twitter account.

HITS AND MISSES

The results, Keith will be the first to tell you, aren’t perfect. He estimated at the time we talked that about 50 percent of the links NPRbackstory finds aren’t really to archival stories — they’re to fresh news stories. The API provides access to stories as soon as NPR puts them into its system, and that sometimes meaning getting the hot news instead of the backstory. Take this tweet triggered by increased search traffic for the former congressman Jack Kemp; it points directly to an NPR story on his death Saturday night.

Another 15 percent of the results, he said, are complete misses. Those are usually caused by search terms that have multiple meanings. For instance, there’s this tweet from Friday triggered by searches for insurance company The Hartford. Those searchers were likely hunting for information about the company’s announcement of lower-than-expected quarterly earnings. But NPRbackstory gave readers a link to this David Folkenflik piece about The Hartford Courant.

(And once in a while there’s something way out of left field, like this attempt to tie “plankton” to a memoir by the advice columnist Ask Amy. The word “plankton” appears once in the story, in the seventh paragraph. “It’s a fun project — it’s not a masterpiece,” Keith told me.)

But the rest of the time, it works really well — plucking a gem from the NPR archives that adds context and depth to some subject in the news. Keith compared it to the way that Fresh Air‘s three-decade archive allows it to air something old but newly timely whenever a past interview subject is in the news again.

“It works really well on names,” Keith said. “If somebody OD’s in their hotel room, or they get busted for drunk driving, NPR probably wouldn’t do a story. So you get the last time Terry Gross interviewed them, or something from NPR Music. You get a true backstory when that happens.”

A MODEL TO FOLLOW?

Even though its results can be hit-or-miss, I think the idea behind NPRbackstory is brilliant. News archives are underused assets. For the news organizations that have invested in putting years (or over a century) of past work online, it’s worth investing time to figure out strategies to bring attention to it all.

And it’s also a smart use of Twitter. NPRbackstory gives me a few links a day to interesting stuff I wouldn’t otherwise find — embedded among the tweets from all my friends and others I follow. It’s almost exactly the right amount of material. And it also serves as an off-the-news newswire; just seeing what search terms are hot right now has tipped me off to stories I didn’t otherwise know were happening. (Without this Sunday tweet, I probably wouldn’t have heard about the comic Robert Schimmel’s arrest. That’s true even though the link NPRbackstory offers is to a months-old piece about his battles with cancer, not about the arrest.)

It would be interesting to see other mashups that used different sources to measure what’s hot — like perhaps Wikipedia pageview data. (Traffic to Calvin Borel’s Wikipedia page is up 42,861 percent from normal post-Derby.) But this is an area where I hope we see a lot more innovation — both by news organizations and by the outside programmers who can code against their APIs.

Image of Borel by R Castro, used under Creative Commons license.

POSTED     May 4, 2009, 8 a.m.
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Why The New York Times built a tool for crowdsourced time travel
Madison, a new tool that asks readers to help identify ads in the Times archives, is part of a new open source platform for crowdsourcing built by the company’s R&D Lab.
Opening up the archives: JSTOR wants to tie a library to the news
Its new site JSTOR Daily highlights interesting research and offers background and context on current events.
Six fresh ideas for news design from a #SNDMakes designathon
New media and legacy media came together at the second weekend-long “hackathon” hosted by the Society for News Design.
What to read next
1020
tweets
The newsonomics of the millennial moment
The new wave of news startups is aiming at a younger audience. But do legacy media companies have a chance at earning their attention?
803A mixed bag on apps: What The New York Times learned with NYT Opinion and NYT Now
The two apps were part of the paper’s plan to increase digital subscribers through smaller, targeted offerings. Now, with staff cutbacks on the way, one app is being shuttered and the other is being adjusted.
537Watching what happens: The New York Times is making a front-page bet on real-time aggregation
A new homepage feature called “Watching” offers readers a feed of headlines, tweets, and multimedia from around the web.
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
American Public Media
Upworthy
Amazon
MediaBugs
The UpTake
Facebook
The Weekly Standard
Daily Kos
The Seattle Times
Public Radio International
Wired
USA Today