Nieman Foundation at Harvard
Hot Pod: A podcast ranking that misses a lot, new listenership data, and funny Australians
ABOUT                    SUBSCRIBE
June 2, 2015, 10:15 a.m.
Audience & Social

How a group of researchers tried to use social media data and algorithms to find breaking news

Using geotagged Instagram data, CityBeat tries — often unsuccessfully or belatedly — to find breaking news.

Shortly after 9:30 a.m. on March 12, 2014, two apartment buildings in East Harlem exploded when a water main collapsed into a gas line. Eight people were killed and dozens more were injured.

Journalists rushed to the scene to cover the tragedy, but four newsrooms — The New York Times, BuzzFeed, Gothamist, The New York World — had another tool to help them cover the explosions: CityBeat, a program designed to algorithmically search geotagged social media posts to find news stories in New York City. CityBeat was built by researchers at Cornell Tech, Cornell’s applied sciences outpost in New York City, and Rutgers and was being tested by the four outlets at the time.

Social media posts about the building collapses appeared on CityBeat, but by the time there were enough posts to register in its algorithm, the news organizations themselves already knew about the explosion and had reporters and photographers on the scene.

“[The Harlem Fire] did show up, but it was half an hour later…at that point we’re not using Instagram,” One of the journalists interviewed by the researchers said in their paper on the project.

CityBeat, the participants said, was most useful in covering planned events — conferences, concerts, events, or even PR stunts, such as when a man in a bear suit was spotted walking around Manhattan. The tool was less effective for covering realtime breaking news stories.

“That’s of the things that we talked about in the limitations and understanding the biases of the information,” Raz Schwartz, one of the study’s authors, told me. “Social media data might not be the best way to find these breaking events.”

Schwartz now works on the user experience research team at Facebook, but conducted the study as part of his postdoctoral research at Cornell along with Cornell professor Mor Naaman and Rannie Teodoro from Rutgers. The research was funded by the Brown Institute for Media Innovation at Columbia, and Schwartz presented the paper last week at a conference in Oxford, England.

Though the researchers have moved onto other topics, CityBeat is still live. The site was designed to be shown on big screens in newsrooms and has three main components. There’s the Detected Events List, a compilation of events the algorithm has discovered in the past 24 hours using Instagram data. There’s also the Event Window, which shows specific events and their location within New York. The third element is a sidebar showing statistics on the rate of tweets, popular hashtags, and more.


To detect news events occurring around New York, the CityBeat algorithm examines geotagged Instagram data. If it notices a number of photos posted from one location, it’ll create a Candidate Event, which includes all the photos taken from that location that caused the alert. Once a Candidate Event is created, it’s automatically sent to Amazon Mechanical Turk workers to ensure that it’s actually a newsworthy event and not, say, a lot of people posting pictures of themselves visiting the Empire State Building. But this approach was “problematic,” the authors wrote in the paper.

“In many instances Amazon Mechanical Turk workers would get confused by the number of different photos that appeared and would classify actual events as noise,” the study says.

Algorithms and bots have become more commonplace in news as of late. The Associated Press and the Los Angeles Times now use bots to write certain stories; apps such as SmartNews use algorithms to sort through millions of URLs to display stories for its users; and of course there’s Facebook which can alter publishers’ fortunes with a tweak of its News Feed recipe.

And while the newsrooms that tested the platform weren’t sold on its utility for covering breaking news, Schwartz said he believes there are lessons to be learned from the CityBeat experiment about what role algorithms can play in covering breaking news.

“This is something that we see everywhere,” said Schwartz, referring to the increased use of editorial algorithms. “It’s growing and growing, and we have to understand what it means. We have to understand what happens when we give algorithms the reign of news selection and news making.”

Photo of March 21, 2014 gas explosion in East Harlem by AP/Jeremy Sailing.

POSTED     June 2, 2015, 10:15 a.m.
SEE MORE ON Audience & Social
Show comments  
Show tags
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Hot Pod: A podcast ranking that misses a lot, new listenership data, and funny Australians
Also: Was there a lag in the “Serial effect” on podcast awareness?
BuzzFeed’s Another Round podcast is partnering with a social audio app to let listeners submit their stories
The podcast is working with the app, Rolltape, to make it easier for listeners to submit their own audio.
A Swiss publisher is trying to attract a paying audience with an app sampling stories across publications
Tamedia’s 12-App collects the 12 best stories each day from the company’s 20-plus publications.
What to read next
0Hoping to redefine “trade publication,” Digiday launches Glossy, a vertical to cover disruption in fashion
“I hate the term ‘trade publication,’ because it implies being a boring cheerleader for the industry.”
0Chasing subscriptions over scale, The Athletic wants to turn local sports fandom into a sustainable business — starting in Chicago
“It’s very easy today to be click-driven and produce articles that don’t have a lot of substance or depth and don’t cost that much to produce, but that dynamic is disappointing for fans who want higher-quality content.”
0A year in at Vox, Recode looks at its future: Video, distributed content, more podcasts, and no /
“There’s a huge opportunity to be a widely read, digitally native business site that uses tech as our lens, and I don’t think that’s out there.”
Fuego is our heat-seeking Twitter bot, tracking the links the future-of-journalism crowd is talking about most on Twitter.
Here are a few of the top links Fuego’s currently watching.   Get the full Fuego ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Los Angeles Times
The Sunlight Foundation
The Guardian
Animal Político
USA Today
San Diego News Network