Nieman Foundation at Harvard
HOME
          
LATEST STORY
Bloomberg Businessweek’s editor believes print remains the ultimate “distraction-free news product”
ABOUT                    SUBSCRIBE
March 1, 2013, 1:19 p.m.
LINK: www.theverge.com  ➚   |   Posted by: Joshua Benton   |   March 1, 2013

Russell Brandom at The Verge has a piece on Common Crawl, “a non-profit foundation dedicated to providing an open repository of web crawl data that can be accessed and analyzed by everyone.” At one extreme, that dataset could be used to build your own local or targeted search engine; at a smaller scale, it could be a boon to data journalists:

For example, web crawl data can be used to spot trends and identify patterns in politics, economics, health, popular culture and many other aspects of life. It provides an immensely rich corpus for scientific research, technological advancement, and innovative new businesses. It is crucial for our information-based society that the web be openly accessible to anyone who desires to utilize it.

Be forewarned: If you think a Hadoop cluster is a kind of Easter candy, this isn’t the weekend hacking project for you. (Here’s an earlier piece from MIT Technology Review.)

Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
Bloomberg Businessweek’s editor believes print remains the ultimate “distraction-free news product”
“I’ve joked about Businessweek(ish); I don’t think that one was really considered.”
The Copa, Euro, and Wimbledon finals collide on July 14. Here’s how The Athletic is preparing for its “biggest day ever.”
The Athletic intends to use its live coverage as a “shop window,” giving new readers a taste of what they might get if they subscribed.
Making sense of science: Using LLMs to help reporters understand complex research
Can AI models save reporters time in figuring out an unfamiliar field’s jargon?