Nieman Foundation at Harvard
HOME
          
LATEST STORY
Chasing subscriptions over scale, The Athletic wants to turn local sports fandom into a sustainable business — starting in Chicago
ABOUT                    SUBSCRIBE
Dec. 9, 2014, 2:56 p.m.
Reporting & Production

The New York Times R&D Lab releases Hive, an open-source crowdsourcing tool

“We want to learn from others who are doing good things, and when we learn things we share them as well.”

hive-logo-medA few months ago we told you about a new tool from The New York Times that allowed readers to help identify ads inside the paper’s massive archive. Madison, as it was called, was the first iteration on a new crowdsourcing tool from The New York Times R&D Lab that would make it easier to break down specific tasks and get users to help an organization get at the data they need.

Today the R&D Lab is opening up the platform that powers the whole thing. Hive is an open-source framework that lets anyone build their own crowdsourcing project. The code responsible for Hive is now available on GitHub. With Hive, a developer can create assignments for users, define what they need to do, and keep track of their progress in helping to solve problems.

Here’s the R&D Lab’s Jacqui Maher with some of the nuts and bolts of Hive:

The system we built is Hive, an open-source platform that lets developers produce crowdsourcing applications for a variety of contexts. Informed by our work on Streamtools, Hive’s technical architecture takes advantage of Go’s efficiency in parsing and transmitting JSON along with its straightforward interface to Elasticsearch. Combining the speed of a compiled language with the flexibility of a search engine means Hive is able to handle a wide variety of user-submitted contributions on diverse sets of tasks.

NYTRDMatt Boggie, executive director of the R&D Lab, said Madison evolved from the print archive app TimesMachine, but in creating the tool they realized it could serve multiple purposes outside the Times’ back pages. “The big thing was we realized the problem we were solving was one particular manifestation of a common problem lots of organizations have,” he said.

The decision to make Hive open-source was fairly simple, he said, since so many news organizations have made a habit of asking readers for help in sifting through documents or making sense of disorganized piles of data. The benefit to the Times is seeing how other people and organizations use the platform and what ideas they can apply at the paper. “We want to learn from others who are doing good things, and when we learn things we share them as well,” he said.

In the case of Madison, the Times needed several types of data: the text of an ad, the product it was selling, and any information on the visuals or the size of the ad. Boggie said the trick was to make a system that could fit their specific needs while also being open enough to be useful for other purposes. The solution was to break crowdsourcing down into a series of smaller tasks that create a kind of feedback loop. For instance, in Madison, users are asked to find, tag, and transcribe ads. Each of those steps are only possible through the work of the other; in order to tag or transcribe an ad, you have to correctly identify what is an ad.

Boggie said so far they’ve had over 14,000 people use Madison and contribute some form of work. More than 100,000 assignments have been completed, and Boggie said they hope to open up a new set of ads — get ready for the 1970s — in early 2015. They also plan to make the data collected from Madison on the ads from the 1960s available as well.

POSTED     Dec. 9, 2014, 2:56 p.m.
SEE MORE ON Reporting & Production
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Chasing subscriptions over scale, The Athletic wants to turn local sports fandom into a sustainable business — starting in Chicago
“It’s very easy today to be click-driven and produce articles that don’t have a lot of substance or depth and don’t cost that much to produce, but that dynamic is disappointing for fans who want higher-quality content.”
Hot Pod: We now have new, free rankings to show how podcasts stack up against each other
Plus: Parsing the RadioPublic announcement; premium podcast subscriptions; Bill Simmons oversimplifies things.
BuzzFeed is building a New York-based team to experiment with news video
It is the “center of a Venn diagram” between BuzzFeed Motion Pictures and BuzzFeed News.
What to read next
0
tweets
The Verge launches Circuit Breaker, a gadget blog-as-Facebook page
The Verge is launching a new gadget blog that is built for Facebook. (Articles will also run on The Verge’s website.)
0Millennial-focused local startup Charlotte Agenda is expanding its model to a second city, Raleigh
The North Carolina startup says it’s profitable and is looking to expand its reach — but it’s not seeking outside funding.
0With a scripted daily comedy news show, Mic looks to add a little late night TV to the social video mold
“We don’t just present a bunch of headlines and say what we think. Our videos are chock-full of facts and research.”
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
The Times of London
AOL
PolitiFact
Facebook
Tampa Bay Times
Current TV
TechCrunch
E.W. Scripps
MediaNews Group
Sports Illustrated
Voice of San Diego
The Daily Telegraph