Nieman Foundation at Harvard
HOME
          
LATEST STORY
PressPad, an attempt to bring some class diversity to posh British journalism, is shutting down
ABOUT                    SUBSCRIBE
Dec. 9, 2014, 2:56 p.m.
Reporting & Production

The New York Times R&D Lab releases Hive, an open-source crowdsourcing tool

“We want to learn from others who are doing good things, and when we learn things we share them as well.”

hive-logo-medA few months ago we told you about a new tool from The New York Times that allowed readers to help identify ads inside the paper’s massive archive. Madison, as it was called, was the first iteration on a new crowdsourcing tool from The New York Times R&D Lab that would make it easier to break down specific tasks and get users to help an organization get at the data they need.

Today the R&D Lab is opening up the platform that powers the whole thing. Hive is an open-source framework that lets anyone build their own crowdsourcing project. The code responsible for Hive is now available on GitHub. With Hive, a developer can create assignments for users, define what they need to do, and keep track of their progress in helping to solve problems.

Here’s the R&D Lab’s Jacqui Maher with some of the nuts and bolts of Hive:

The system we built is Hive, an open-source platform that lets developers produce crowdsourcing applications for a variety of contexts. Informed by our work on Streamtools, Hive’s technical architecture takes advantage of Go’s efficiency in parsing and transmitting JSON along with its straightforward interface to Elasticsearch. Combining the speed of a compiled language with the flexibility of a search engine means Hive is able to handle a wide variety of user-submitted contributions on diverse sets of tasks.

NYTRDMatt Boggie, executive director of the R&D Lab, said Madison evolved from the print archive app TimesMachine, but in creating the tool they realized it could serve multiple purposes outside the Times’ back pages. “The big thing was we realized the problem we were solving was one particular manifestation of a common problem lots of organizations have,” he said.

The decision to make Hive open-source was fairly simple, he said, since so many news organizations have made a habit of asking readers for help in sifting through documents or making sense of disorganized piles of data. The benefit to the Times is seeing how other people and organizations use the platform and what ideas they can apply at the paper. “We want to learn from others who are doing good things, and when we learn things we share them as well,” he said.

In the case of Madison, the Times needed several types of data: the text of an ad, the product it was selling, and any information on the visuals or the size of the ad. Boggie said the trick was to make a system that could fit their specific needs while also being open enough to be useful for other purposes. The solution was to break crowdsourcing down into a series of smaller tasks that create a kind of feedback loop. For instance, in Madison, users are asked to find, tag, and transcribe ads. Each of those steps are only possible through the work of the other; in order to tag or transcribe an ad, you have to correctly identify what is an ad.

Boggie said so far they’ve had over 14,000 people use Madison and contribute some form of work. More than 100,000 assignments have been completed, and Boggie said they hope to open up a new set of ads — get ready for the 1970s — in early 2015. They also plan to make the data collected from Madison on the ads from the 1960s available as well.

POSTED     Dec. 9, 2014, 2:56 p.m.
SEE MORE ON Reporting & Production
Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
PressPad, an attempt to bring some class diversity to posh British journalism, is shutting down
“While there is even more need for this intervention than when we began the project, the initiative needs more resources than the current team can provide.”
Is the Texas Tribune an example or an exception? A conversation with Evan Smith about earned income
“I think risk aversion is the thing that’s killing our business right now.”
The California Journalism Preservation Act would do more harm than good. Here’s how the state might better help news
“If there are resources to be put to work, we must ask where those resources should come from, who should receive them, and on what basis they should be distributed.”