Nieman Foundation at Harvard
Stat is publishing a print section in Sunday’s Boston Globe — and it might be coming to a paper near you
ABOUT                    SUBSCRIBE
Dec. 9, 2014, 2:56 p.m.
Reporting & Production

The New York Times R&D Lab releases Hive, an open-source crowdsourcing tool

“We want to learn from others who are doing good things, and when we learn things we share them as well.”

hive-logo-medA few months ago we told you about a new tool from The New York Times that allowed readers to help identify ads inside the paper’s massive archive. Madison, as it was called, was the first iteration on a new crowdsourcing tool from The New York Times R&D Lab that would make it easier to break down specific tasks and get users to help an organization get at the data they need.

Today the R&D Lab is opening up the platform that powers the whole thing. Hive is an open-source framework that lets anyone build their own crowdsourcing project. The code responsible for Hive is now available on GitHub. With Hive, a developer can create assignments for users, define what they need to do, and keep track of their progress in helping to solve problems.

Here’s the R&D Lab’s Jacqui Maher with some of the nuts and bolts of Hive:

The system we built is Hive, an open-source platform that lets developers produce crowdsourcing applications for a variety of contexts. Informed by our work on Streamtools, Hive’s technical architecture takes advantage of Go’s efficiency in parsing and transmitting JSON along with its straightforward interface to Elasticsearch. Combining the speed of a compiled language with the flexibility of a search engine means Hive is able to handle a wide variety of user-submitted contributions on diverse sets of tasks.

NYTRDMatt Boggie, executive director of the R&D Lab, said Madison evolved from the print archive app TimesMachine, but in creating the tool they realized it could serve multiple purposes outside the Times’ back pages. “The big thing was we realized the problem we were solving was one particular manifestation of a common problem lots of organizations have,” he said.

The decision to make Hive open-source was fairly simple, he said, since so many news organizations have made a habit of asking readers for help in sifting through documents or making sense of disorganized piles of data. The benefit to the Times is seeing how other people and organizations use the platform and what ideas they can apply at the paper. “We want to learn from others who are doing good things, and when we learn things we share them as well,” he said.

In the case of Madison, the Times needed several types of data: the text of an ad, the product it was selling, and any information on the visuals or the size of the ad. Boggie said the trick was to make a system that could fit their specific needs while also being open enough to be useful for other purposes. The solution was to break crowdsourcing down into a series of smaller tasks that create a kind of feedback loop. For instance, in Madison, users are asked to find, tag, and transcribe ads. Each of those steps are only possible through the work of the other; in order to tag or transcribe an ad, you have to correctly identify what is an ad.

Boggie said so far they’ve had over 14,000 people use Madison and contribute some form of work. More than 100,000 assignments have been completed, and Boggie said they hope to open up a new set of ads — get ready for the 1970s — in early 2015. They also plan to make the data collected from Madison on the ads from the 1960s available as well.

POSTED     Dec. 9, 2014, 2:56 p.m.
SEE MORE ON Reporting & Production
Show comments  
Show tags
Join the 35,000 who get the freshest future-of-journalism news in our daily email.
Stat is publishing a print section in Sunday’s Boston Globe — and it might be coming to a paper near you
The health and life sciences site is in talks with other newspapers about republishing its coverage in print.
A new database of fake news sites details how much fakery has spread from Trump v. Clinton to local news
Plus: The New York Times walks back an extremely popular tweet, California adds media literacy to its curriculum, and the KIND Foundation tries out a “Pop Your Bubble” app that nobody is going to want to use.
Nieman Lab is looking for more stories of digital innovation outside the U.S., and we’d love your help
Have ideas for things we should cover? Want to help us cover them? Want to help us translate stories so that they reach more people? Join our new Slack community!