May 29, 2012, 1:50 p.m.
Reporting & Production
3 new ideas on the future of news from MIT Media Lab students

Adding metadata to hyperlinks, finding stories in ordinary datasets, providing context for impossibly big numbers.

Keys on computer keyboard spelling "geek"

Ethan Zuckerman of the MIT Center for Civic Media taught a class this semester tailor-made for Nieman Lab readers: “News in the Age of Participatory Media.” The hook: What happens if you treat journalism as an engineering problem, bringing together the efforts of journalists and computer scientists?

The course’s final class last week featured a lot of bright students presenting their final projects, which was supposed to be a new tool, technique, or technology for reporting the news. (They were in various stages of completion.) I’ll be breaking out a few of the good ideas in future posts, but here are some of the ones that stood out to me.

Modernizing the hyperlink

The <a> tag hasn’t changed much since Tim Berners-Lee proposed it 20 years ago. Hyperlinks are the fiber of the web. But Neha Narula, a Ph.D. student of computer science at MIT, finds herself frustrated with writers who abuse them. Blog posts littered with too many links leads to “cognitive overload,” she says. “As I explored this topic a little more,” she said, “I found what I was annoyed with was not linking too much but not linking well.” If Google is mentioned in copy, does Google have to be linked to the Google home page? Does the same link need to appear multiple times in one story?

Narula proposed the use of microformats and the little-known rev attribute to attach semantic meaning to links, allowing browsers to handle different kinds of links differently. (rev is supposed to represent a reverse link. All major browsers, when faced with a rev attribute now, just ignore it. It’s like a cousin to rel.)

For example, a link to a citation (dictionary definition, Wikipedia article) would get rev="bib", for bibliography. So:

<a href="" rev="bib">

might lead to that link being presented not in the body copy, but at the bottom of the post, in the form of a tidy bibliography.

She also proposes rev="reaction", which would clearly call out the original post an article is responding to; and rev="object" for links to people and companies, which would facilitate an index for all of the proper nouns in a piece.

Perhaps most intriguing was rev="set" for a series of links, to avoid awkwardness when linking to (for example) this series of Lab articles on the hyperlinking debate. She mocked up a little bit of JavaScript and CSS to show how it could look. (Hover over “Twitter users to follow” or “BBC linking policy.” You can also see mockups of the object, reaction, and bib attributes there.)

Oh, and the biggest crowd pleaser was a feature you may love or hate: a button that toggles off all links in a document for distraction-free (or, er, context-free) reading. (Try it on this article!)

Others have proposed approaches to adding metadata to links, from nofollow to syndication-source to standout to FOAF. Zuckerman suggested Narula create WordPress and Drupal plugins to encourage adoption. Getting the rest of the web on board would be a tall order.

Searching for correlations in a haystack

Eugene Wu, a graduate student of computer science at MIT, demonstrated a suite of tools called DBTruck that makes data comparison a snap. Enter the URL of a CSV file, JSON data, or an HTML table and DBTruck will clean up the data and import it to a local database. Normally you might go to a web page like this, select and copy the table, paste it into an Excel spreadsheet, then spend 15 minutes trying to fix the misplaced cells and formatting issues. DBTruck is automated and fast.

The program allows you to geocode any field that contains address information, whether that field is “Cambridge, MA” or “Cambridge, Mass.” or “1 Francis Ave, Cambridge.” Humans have come up with many ways to represent physical locations, but geographic coordinates are unambiguous instructions for computers to map a location. When you’re dealing with disorganized datasets, getting consistency is key.

Wu’s tool then lets you plot arbitrary comparisons between datasets. To test the program he plugged in all kinds of datasets, just for fun. Is there a correlation between addresses of Massachusetts lottery winners and Taco Bell locations? (No.) Suicide rates and unemployment rates in New York state? (No.) Suddenly he stumbled upon a connection that made sense: Communities in New York state with high teen pregnancy rates correlated highly with low birth weights. There’s a potential story there that Wu might not have otherwise set out to write. Zuckerman advised Wu to team up with The Boston Globe to run more arbitrary comparisons and discover what local stories might be hidden in the numbers. (It also seems like a dandy add-on to the PANDA Project, which is building a platform for in-house newsroom databases.)

How many Rhode Islands is that?

Nieman Fellow Paul Salopek and Knight Science Journalism Fellow/Reuters correspondent Alister Doyle have covered large-scale calamities in far-off countries for domestic audiences sometimes too busy to care. Foreign correspondents have tricks, sometimes clichés, to get people to pay attention, comparing populations and land masses to familiar American things. Write Salopek and Doyle:

Too often we just get a giant number — the U.S. debt is $15 trillion, Chinese greenhouse gases are the highest in the world at 7 billion tonnes a year, Americans spend $8 billion a year on cosmetics, etc. Is there some way of helping to put these statistics — huge to the point of meaningless — into an understandable, human framework?

They propose something like a currency converter that turns impossibly big numbers into more qualitative terms. Great for a correspondent on a deadline.

If it’s an economics story, what does your share of debts or GDP represent? A new car? A house? How many vacations? How many pizzas? How would it be, for instance, if everyone had the debts of the average Greek citizen? (awful, in most countries). How would global warming be if everyone emitted greenhouse gases at the rate of an Indian? (much better). The U.S. debt works out at about $50,000 a person — what can you buy with that?

The site would be user-maintained, like Wikipedia, and powered by real datasets. All statistics would require citations. It’s just an idea at this point, but a website like this is very buildable. (Anyone want to try it? Leave a comment below.) Salopek and Doyle offer a dizzying number of potential cross-discipline conversion units. How about Ayns, a unit of measure for how friendly a government is to corporations, named for Ayn Rand? Or the Obama Gap, a measure of the difference between a leader’s domestic and foreign approval ratings? Or Jolies, a unit of a country’s developmental aid as proportional to the amount of attention it has received from Angelina Jolie? (The Economist’s long-running Big Mac index is of similar spirit.)

Along with the three projects mentioned above, a couple others caught my eye: Nathan Matias’s Data Forager, which slurps up all the Twitter handles mentioned on a webpage and builds a Twitter list that follows those people, and Arlene Ducao’s OpenIR, a much larger project that overlays multiple layers of satellite imagery on a map.

To paraphrase Zuckerman, I hope these ideas earn at least 40 nanoKaradashians of your attention today.

Photo by Solo used under a Creative Commons license.

POSTED     May 29, 2012, 1:50 p.m.
