Nieman Foundation at Harvard
HOME
          
LATEST STORY
How can we reach beyond the local news choir? Spotlight PA’s founding editor has ideas
ABOUT                    SUBSCRIBE
Dec. 9, 2014, 2:56 p.m.
Reporting & Production

The New York Times R&D Lab releases Hive, an open-source crowdsourcing tool

“We want to learn from others who are doing good things, and when we learn things we share them as well.”

hive-logo-medA few months ago we told you about a new tool from The New York Times that allowed readers to help identify ads inside the paper’s massive archive. Madison, as it was called, was the first iteration on a new crowdsourcing tool from The New York Times R&D Lab that would make it easier to break down specific tasks and get users to help an organization get at the data they need.

Today the R&D Lab is opening up the platform that powers the whole thing. Hive is an open-source framework that lets anyone build their own crowdsourcing project. The code responsible for Hive is now available on GitHub. With Hive, a developer can create assignments for users, define what they need to do, and keep track of their progress in helping to solve problems.

Here’s the R&D Lab’s Jacqui Maher with some of the nuts and bolts of Hive:

The system we built is Hive, an open-source platform that lets developers produce crowdsourcing applications for a variety of contexts. Informed by our work on Streamtools, Hive’s technical architecture takes advantage of Go’s efficiency in parsing and transmitting JSON along with its straightforward interface to Elasticsearch. Combining the speed of a compiled language with the flexibility of a search engine means Hive is able to handle a wide variety of user-submitted contributions on diverse sets of tasks.

NYTRDMatt Boggie, executive director of the R&D Lab, said Madison evolved from the print archive app TimesMachine, but in creating the tool they realized it could serve multiple purposes outside the Times’ back pages. “The big thing was we realized the problem we were solving was one particular manifestation of a common problem lots of organizations have,” he said.

The decision to make Hive open-source was fairly simple, he said, since so many news organizations have made a habit of asking readers for help in sifting through documents or making sense of disorganized piles of data. The benefit to the Times is seeing how other people and organizations use the platform and what ideas they can apply at the paper. “We want to learn from others who are doing good things, and when we learn things we share them as well,” he said.

In the case of Madison, the Times needed several types of data: the text of an ad, the product it was selling, and any information on the visuals or the size of the ad. Boggie said the trick was to make a system that could fit their specific needs while also being open enough to be useful for other purposes. The solution was to break crowdsourcing down into a series of smaller tasks that create a kind of feedback loop. For instance, in Madison, users are asked to find, tag, and transcribe ads. Each of those steps are only possible through the work of the other; in order to tag or transcribe an ad, you have to correctly identify what is an ad.

Boggie said so far they’ve had over 14,000 people use Madison and contribute some form of work. More than 100,000 assignments have been completed, and Boggie said they hope to open up a new set of ads — get ready for the 1970s — in early 2015. They also plan to make the data collected from Madison on the ads from the 1960s available as well.

POSTED     Dec. 9, 2014, 2:56 p.m.
SEE MORE ON Reporting & Production
Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
How can we reach beyond the local news choir? Spotlight PA’s founding editor has ideas
In the wake of the 2024 election, where “democracy” was not a top issue for most voters, local news messaging focused on democracy may not suffice to build the broad coalition essential to give local news in the U.S. a sustainable future.
Robert W. McChesney, America’s leading left-wing critic of corporate media, has died
After studying the early days of radio, McChesney developed a holistic critique of media structures that exposed how open they were to manipulation by those in power.
“Some hard and important lessons”: One of the most promising local news nonprofits looks back — and ahead
The National Trust for Local News is a nonprofit organization with a mission so important even its harshest critics want it to succeed.