Nieman Foundation at Harvard
The New York Times is buying the gadget and technology review site The Wirecutter for $30 million
ABOUT                    SUBSCRIBE
June 2, 2015, 10:15 a.m.
Audience & Social

How a group of researchers tried to use social media data and algorithms to find breaking news

Using geotagged Instagram data, CityBeat tries — often unsuccessfully or belatedly — to find breaking news.

Shortly after 9:30 a.m. on March 12, 2014, two apartment buildings in East Harlem exploded when a water main collapsed into a gas line. Eight people were killed and dozens more were injured.

Journalists rushed to the scene to cover the tragedy, but four newsrooms — The New York Times, BuzzFeed, Gothamist, The New York World — had another tool to help them cover the explosions: CityBeat, a program designed to algorithmically search geotagged social media posts to find news stories in New York City. CityBeat was built by researchers at Cornell Tech, Cornell’s applied sciences outpost in New York City, and Rutgers and was being tested by the four outlets at the time.

Social media posts about the building collapses appeared on CityBeat, but by the time there were enough posts to register in its algorithm, the news organizations themselves already knew about the explosion and had reporters and photographers on the scene.

“[The Harlem Fire] did show up, but it was half an hour later…at that point we’re not using Instagram,” One of the journalists interviewed by the researchers said in their paper on the project.

CityBeat, the participants said, was most useful in covering planned events — conferences, concerts, events, or even PR stunts, such as when a man in a bear suit was spotted walking around Manhattan. The tool was less effective for covering realtime breaking news stories.

“That’s of the things that we talked about in the limitations and understanding the biases of the information,” Raz Schwartz, one of the study’s authors, told me. “Social media data might not be the best way to find these breaking events.”

Schwartz now works on the user experience research team at Facebook, but conducted the study as part of his postdoctoral research at Cornell along with Cornell professor Mor Naaman and Rannie Teodoro from Rutgers. The research was funded by the Brown Institute for Media Innovation at Columbia, and Schwartz presented the paper last week at a conference in Oxford, England.

Though the researchers have moved onto other topics, CityBeat is still live. The site was designed to be shown on big screens in newsrooms and has three main components. There’s the Detected Events List, a compilation of events the algorithm has discovered in the past 24 hours using Instagram data. There’s also the Event Window, which shows specific events and their location within New York. The third element is a sidebar showing statistics on the rate of tweets, popular hashtags, and more.


To detect news events occurring around New York, the CityBeat algorithm examines geotagged Instagram data. If it notices a number of photos posted from one location, it’ll create a Candidate Event, which includes all the photos taken from that location that caused the alert. Once a Candidate Event is created, it’s automatically sent to Amazon Mechanical Turk workers to ensure that it’s actually a newsworthy event and not, say, a lot of people posting pictures of themselves visiting the Empire State Building. But this approach was “problematic,” the authors wrote in the paper.

“In many instances Amazon Mechanical Turk workers would get confused by the number of different photos that appeared and would classify actual events as noise,” the study says.

Algorithms and bots have become more commonplace in news as of late. The Associated Press and the Los Angeles Times now use bots to write certain stories; apps such as SmartNews use algorithms to sort through millions of URLs to display stories for its users; and of course there’s Facebook which can alter publishers’ fortunes with a tweak of its News Feed recipe.

And while the newsrooms that tested the platform weren’t sold on its utility for covering breaking news, Schwartz said he believes there are lessons to be learned from the CityBeat experiment about what role algorithms can play in covering breaking news.

“This is something that we see everywhere,” said Schwartz, referring to the increased use of editorial algorithms. “It’s growing and growing, and we have to understand what it means. We have to understand what happens when we give algorithms the reign of news selection and news making.”

Photo of March 21, 2014 gas explosion in East Harlem by AP/Jeremy Sailing.

POSTED     June 2, 2015, 10:15 a.m.
SEE MORE ON Audience & Social
Show comments  
Show tags
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
The New York Times is buying the gadget and technology review site The Wirecutter for $30 million
For the price, the Times is getting one of the real bootstrapped success stories in the past decade of digital media — and a toehold in a growing e-commerce revenue stream.
From East Coast to West Coast: The company behind Miami’s The New Tropic is expanding to Seattle
WhereBy.Us is one of the most interesting digital startups working in the local news space. After starting in Florida, it’s launching The Evergrey in Seattle, and it has its eye on additional markets.
Newsonomics: Here are 10 storylines we’ll be talking about into 2017
The next generations of Murdochs and Sulzbergers step up, two newspaper chains chart the consolidation of the industry, and a Trump-driven shift in straight news reporting.