The newsonomics of MLB’s pioneering mobile experience
ABOUT                    SUBSCRIBE
May 29, 2012, 1:50 p.m.
Reporting & Production
Keys on computer keyboard spelling "geek"

3 new ideas on the future of news from MIT Media Lab students

Adding metadata to hyperlinks, finding stories in ordinary datasets, providing context for impossibly big numbers.

Keys on computer keyboard spelling "geek"

Ethan Zuckerman of the MIT Center for Civic Media taught a class this semester tailor-made for Nieman Lab readers: “News in the Age of Participatory Media.” The hook: What happens if you treat journalism as an engineering problem, bringing together the efforts of journalists and computer scientists?

The course’s final class last week featured a lot of bright students presenting their final projects, which was supposed to be a new tool, technique, or technology for reporting the news. (They were in various stages of completion.) I’ll be breaking out a few of the good ideas in future posts, but here are some of the ones that stood out to me.

Modernizing the hyperlink

The <a> tag hasn’t changed much since Tim Berners-Lee proposed it 20 years ago. Hyperlinks are the fiber of the web. But Neha Narula, a Ph.D. student of computer science at MIT, finds herself frustrated with writers who abuse them. Blog posts littered with too many links leads to “cognitive overload,” she says. “As I explored this topic a little more,” she said, “I found what I was annoyed with was not linking too much but not linking well.” If Google is mentioned in copy, does Google have to be linked to the Google home page? Does the same link need to appear multiple times in one story?

Narula proposed the use of microformats and the little-known rev attribute to attach semantic meaning to links, allowing browsers to handle different kinds of links differently. (rev is supposed to represent a reverse link. All major browsers, when faced with a rev attribute now, just ignore it. It’s like a cousin to rel.)

For example, a link to a citation (dictionary definition, Wikipedia article) would get rev="bib", for bibliography. So:

<a href="http://en.wikipedia.org/en/Nieman_Foundation" rev="bib">

might lead to that link being presented not in the body copy, but at the bottom of the post, in the form of a tidy bibliography.

She also proposes rev="reaction", which would clearly call out the original post an article is responding to; and rev="object" for links to people and companies, which would facilitate an index for all of the proper nouns in a piece.

Perhaps most intriguing was rev="set" for a series of links, to avoid awkwardness when linking to (for example) this series of Lab articles on the hyperlinking debate. She mocked up a little bit of JavaScript and CSS to show how it could look. (Hover over “Twitter users to follow” or “BBC linking policy.” You can also see mockups of the object, reaction, and bib attributes there.)

Oh, and the biggest crowd pleaser was a feature you may love or hate: a button that toggles off all links in a document for distraction-free (or, er, context-free) reading. (Try it on this article!)

Others have proposed approaches to adding metadata to links, from nofollow to syndication-source to standout to FOAF. Zuckerman suggested Narula create WordPress and Drupal plugins to encourage adoption. Getting the rest of the web on board would be a tall order.

Searching for correlations in a haystack

Eugene Wu, a graduate student of computer science at MIT, demonstrated a suite of tools called DBTruck that makes data comparison a snap. Enter the URL of a CSV file, JSON data, or an HTML table and DBTruck will clean up the data and import it to a local database. Normally you might go to a web page like this, select and copy the table, paste it into an Excel spreadsheet, then spend 15 minutes trying to fix the misplaced cells and formatting issues. DBTruck is automated and fast.

The program allows you to geocode any field that contains address information, whether that field is “Cambridge, MA” or “Cambridge, Mass.” or “1 Francis Ave, Cambridge.” Humans have come up with many ways to represent physical locations, but geographic coordinates are unambiguous instructions for computers to map a location. When you’re dealing with disorganized datasets, getting consistency is key.

Wu’s tool then lets you plot arbitrary comparisons between datasets. To test the program he plugged in all kinds of datasets, just for fun. Is there a correlation between addresses of Massachusetts lottery winners and Taco Bell locations? (No.) Suicide rates and unemployment rates in New York state? (No.) Suddenly he stumbled upon a connection that made sense: Communities in New York state with high teen pregnancy rates correlated highly with low birth weights. There’s a potential story there that Wu might not have otherwise set out to write. Zuckerman advised Wu to team up with The Boston Globe to run more arbitrary comparisons and discover what local stories might be hidden in the numbers. (It also seems like a dandy add-on to the PANDA Project, which is building a platform for in-house newsroom databases.)

How many Rhode Islands is that?

Nieman Fellow Paul Salopek and Knight Science Journalism Fellow/Reuters correspondent Alister Doyle have covered large-scale calamities in far-off countries for domestic audiences sometimes too busy to care. Foreign correspondents have tricks, sometimes clichés, to get people to pay attention, comparing populations and land masses to familiar American things. Write Salopek and Doyle:

Too often we just get a giant number — the U.S. debt is $15 trillion, Chinese greenhouse gases are the highest in the world at 7 billion tonnes a year, Americans spend $8 billion a year on cosmetics, etc. Is there some way of helping to put these statistics — huge to the point of meaningless — into an understandable, human framework?

They propose something like a currency converter that turns impossibly big numbers into more qualitative terms. Great for a correspondent on a deadline.

If it’s an economics story, what does your share of debts or GDP represent? A new car? A house? How many vacations? How many pizzas? How would it be, for instance, if everyone had the debts of the average Greek citizen? (awful, in most countries). How would global warming be if everyone emitted greenhouse gases at the rate of an Indian? (much better). The U.S. debt works out at about $50,000 a person — what can you buy with that?

The site would be user-maintained, like Wikipedia, and powered by real datasets. All statistics would require citations. It’s just an idea at this point, but a website like this is very buildable. (Anyone want to try it? Leave a comment below.) Salopek and Doyle offer a dizzying number of potential cross-discipline conversion units. How about Ayns, a unit of measure for how friendly a government is to corporations, named for Ayn Rand? Or the Obama Gap, a measure of the difference between a leader’s domestic and foreign approval ratings? Or Jolies, a unit of a country’s developmental aid as proportional to the amount of attention it has received from Angelina Jolie? (The Economist’s long-running Big Mac index is of similar spirit.)

Along with the three projects mentioned above, a couple others caught my eye: Nathan Matias’s Data Forager, which slurps up all the Twitter handles mentioned on a webpage and builds a Twitter list that follows those people, and Arlene Ducao’s OpenIR, a much larger project that overlays multiple layers of satellite imagery on a map.

To paraphrase Zuckerman, I hope these ideas earn at least 40 nanoKaradashians of your attention today.

Photo by Solo used under a Creative Commons license.

POSTED     May 29, 2012, 1:50 p.m.
SEE MORE ON Reporting & Production
Show comments  
Show tags
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
The newsonomics of MLB’s pioneering mobile experience
Running a sports league and running a news operation aren’t the same thing. But there are lessons to be learned from baseball’s success in navigating mobile.
Why The New York Times built a tool for crowdsourced time travel
Madison, a new tool that asks readers to help identify ads in the Times archives, is part of a new open source platform for crowdsourcing built by the company’s R&D Lab.
Opening up the archives: JSTOR wants to tie a library to the news
Its new site JSTOR Daily highlights interesting research and offers background and context on current events.
What to read next
The newsonomics of the millennial moment
The new wave of news startups is aiming at a younger audience. But do legacy media companies have a chance at earning their attention?
803A mixed bag on apps: What The New York Times learned with NYT Opinion and NYT Now
The two apps were part of the paper’s plan to increase digital subscribers through smaller, targeted offerings. Now, with staff cutbacks on the way, one app is being shuttered and the other is being adjusted.
413The new Vox daily email, explained
The company’s newsletter, Vox Sentences, enters an increasingly crowded inbox. Can concise writing and smart aggregation on the day’s news help expand their audience?
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
News Corp
Seattle Post-Intelligencer
La Nación
American Public Media
The Globe and Mail
ABC News
The Philadelphia Inquirer & Daily News