NPRbackstory: Finding value in news archives through automation

By Joshua Benton @jbenton May 4, 2009, 8 a.m.

I watched the Kentucky Derby on Saturday — my allotted two minutes of horse racing a year — and got to see jockey Calvin Borel pilot a 50-1 longshot named Mine That Bird to a stirring come-from-behind victory. (My interest in horse racing is pretty much limited to rooting for jockeys like Borel, Robby Albarado, and Kent Desormeaux who are, like me, south Louisiana Cajuns. Horses, like crawfish, my people do well.)

Like a lot of people, I wrote about the Derby on Twitter. (Note this pre-race prediction that, had I’d acted on it, would have earned me $103.20 for three $2 bets.) After Borel’s victory, his name became one of the most popular search terms on Twitter.

But in the sea of people writing about Borel, there was one tweet that stood out:

It was from an automated Twitter account called NPRbackstory, a project by a man named Keith Hopper. Keith is a project manager at Public Interactive, a division of NPR, and NPRbackstory is an intriguing experiment in getting value out of one of the most overlooked assets any established news organization has: its archives.

The link in the Borel tweet is to a brief piece NPR’s Noah Adams did on the jockey two years ago, after he’d won his first Kentucky Derby. The genius of NPR Backstory is that it took no human intervention to create that tweet: the code behind it automatically detected that lots of people were suddenly searching for information about Calvin Borel, searched NPR’s archives for any Borel-related stories, found one, and posted a link to Twitter.

(I talked to Keith about this project way back last fall, before the Lab launched, and seeing that Borel tweet reminded me that I’d never written up our interview.)

SEEKING VALUE IN THE ARCHIVES

Keith told me back then that the original impetus behind NPRbackstory was the launch last July of NPR’s API, the programming interface that gives coders access to stories and other content the network has produced since 1995.

“We thought: What cool could we do with that? How might I quickly deploy something that uses the API?”

He ran into a question faced by any programmer who deals with news APIs. News organizations are really good at creating news stories. But those stories go stale very quickly. For a news-hungry world, what’s the use of having access to a bunch of radio stories from 1995?

“The NPR content is more rich in its breadth than it is in timeliness,” Keith said. “That’s probably true of most news archives. But the Internet places a high value on timeliness, and I was looking at the API saying, ‘There’s nothing timely here!'”

So he hit on the idea of providing the backstory to subjects currently in the news. “I think there’s this yearning for meaning in our content,” he said. “We want a lot of the same information, but packaged differently. I thought something that looked at the context or the background for something would be something I’d welcome seeing in my Twitter feed.”

NPRbackstory uses Google’s Hot Trends data to determine what topics people have suddenly started searching for in large numbers. It uses NPR’s API to search the archives, then uses Yahoo Pipes to create an RSS feed that then gets cycled into the NPRbackstory Twitter account.

HITS AND MISSES

The results, Keith will be the first to tell you, aren’t perfect. He estimated at the time we talked that about 50 percent of the links NPRbackstory finds aren’t really to archival stories — they’re to fresh news stories. The API provides access to stories as soon as NPR puts them into its system, and that sometimes meaning getting the hot news instead of the backstory. Take this tweet triggered by increased search traffic for the former congressman Jack Kemp; it points directly to an NPR story on his death Saturday night.

Another 15 percent of the results, he said, are complete misses. Those are usually caused by search terms that have multiple meanings. For instance, there’s this tweet from Friday triggered by searches for insurance company The Hartford. Those searchers were likely hunting for information about the company’s announcement of lower-than-expected quarterly earnings. But NPRbackstory gave readers a link to this David Folkenflik piece about The Hartford Courant.

(And once in a while there’s something way out of left field, like this attempt to tie “plankton” to a memoir by the advice columnist Ask Amy. The word “plankton” appears once in the story, in the seventh paragraph. “It’s a fun project — it’s not a masterpiece,” Keith told me.)

But the rest of the time, it works really well — plucking a gem from the NPR archives that adds context and depth to some subject in the news. Keith compared it to the way that Fresh Air‘s three-decade archive allows it to air something old but newly timely whenever a past interview subject is in the news again.

“It works really well on names,” Keith said. “If somebody OD’s in their hotel room, or they get busted for drunk driving, NPR probably wouldn’t do a story. So you get the last time Terry Gross interviewed them, or something from NPR Music. You get a true backstory when that happens.”

A MODEL TO FOLLOW?

Even though its results can be hit-or-miss, I think the idea behind NPRbackstory is brilliant. News archives are underused assets. For the news organizations that have invested in putting years (or over a century) of past work online, it’s worth investing time to figure out strategies to bring attention to it all.

And it’s also a smart use of Twitter. NPRbackstory gives me a few links a day to interesting stuff I wouldn’t otherwise find — embedded among the tweets from all my friends and others I follow. It’s almost exactly the right amount of material. And it also serves as an off-the-news newswire; just seeing what search terms are hot right now has tipped me off to stories I didn’t otherwise know were happening. (Without this Sunday tweet, I probably wouldn’t have heard about the comic Robert Schimmel’s arrest. That’s true even though the link NPRbackstory offers is to a months-old piece about his battles with cancer, not about the arrest.)

It would be interesting to see other mashups that used different sources to measure what’s hot — like perhaps Wikipedia pageview data. (Traffic to Calvin Borel’s Wikipedia page is up 42,861 percent from normal post-Derby.) But this is an area where I hope we see a lot more innovation — both by news organizations and by the outside programmers who can code against their APIs.

Image of Borel by R Castro, used under Creative Commons license.

Joshua Benton is the senior writer and former director of Nieman Lab. You can reach him via email (joshua_benton@harvard.edu) or Twitter DM (@jbenton).

POSTED May 4, 2009, 8 a.m.

Show tags

TWITTER FACEBOOK EMAIL