Nieman Foundation at Harvard
HOME
          
LATEST STORY
“The news feeds do not sag”: A look at Ukraine’s local news landscape, more than a year into the war
ABOUT                    SUBSCRIBE
March 1, 2013, 1:19 p.m.
LINK: www.theverge.com  ➚   |   Posted by: Joshua Benton   |   March 1, 2013

Russell Brandom at The Verge has a piece on Common Crawl, “a non-profit foundation dedicated to providing an open repository of web crawl data that can be accessed and analyzed by everyone.” At one extreme, that dataset could be used to build your own local or targeted search engine; at a smaller scale, it could be a boon to data journalists:

For example, web crawl data can be used to spot trends and identify patterns in politics, economics, health, popular culture and many other aspects of life. It provides an immensely rich corpus for scientific research, technological advancement, and innovative new businesses. It is crucial for our information-based society that the web be openly accessible to anyone who desires to utilize it.

Be forewarned: If you think a Hadoop cluster is a kind of Easter candy, this isn’t the weekend hacking project for you. (Here’s an earlier piece from MIT Technology Review.)

Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
“The news feeds do not sag”: A look at Ukraine’s local news landscape, more than a year into the war
Most of the publishers surveyed now view “external migrants” — Ukrainians who’ve left the country — as their target audience.
Uncovering Karachi: How journalists use maps and data to investigate problems in a modern metropolis
“The absence of data, either it paralyzes you or you become more curious.”
How a titan of 20th-century journalism transformed the AP — and the news
“If one man fails to file a story of a millionairess marrying a poor factory hand because that man understands such a story is not properly A.P. stuff, such an error of news judgment ought to be generally made known to other employees.”