HOME
          
LATEST STORY
Snapchat’s new Discover feature could be a significant moment in the evolution of mobile news
ABOUT                    SUBSCRIBE
June 18, 2012, 10:45 a.m.
Aggregation & Discovery
algorithm-city-cc

There’s no such thing as an objective filter: Why designing algorithms that tell us the news is hard

Technologists and humanists take different approaches — and speak different languages.

We are all immersed in an incomprehensible abundance of available information, and we can only read or watch or consume some meaningless fraction of it. What we see, and what we don’t see, is heavily mediated by information filtering algorithms: Google web search, the Facebook news feed, personalization and recommendation engines of all kinds. Filtering algorithms shape our knowledge and our society. They are now a permanent part of what it means to perceive the world.

So it’s important to design them well, to make good choices here. What makes a “good” filtering algorithm? There’s no easy answer.

How can there be? The question of who should see what when is not like the question of how far the moon is from the Earth, or what the mayor ate for breakfast today, or whether Elvis is still alive. Those questions have simple answers that are either right or wrong. But choosing who sees what information is not like that at all: There’s no one answer, just many possible visions of the type of world we’d like to live in.

It’s not just about code. All filtering algorithms operate on human input.

Yet we still need answers. Otherwise there’s no way to build the algorithms we must have, and no way to critique them.

Different disciplines have different approaches to this problem. At the risk of caricature, let’s say there are two broad camps here. The “technologists” are engineers, computer scientists, people with training in quantitative fields. They are the people likely to be directly responsible for building our filtering systems. The “humanists” are editors, curators, writers, sociologists, humanities scholars. They spend their life on the handcrafted work of deciding that this, and not that, deserves our attention — or examining the consequences of such choices. These two cultures need each other, but they don’t seem to speak the same language.

Technologists have long used two numbers to measure how well a search engine works. “Precision” counts how many of the returned items were actually relevant to a user’s query, while “recall” measures how many relevant items the search should have returned, but didn’t. Together, these two percentages — false positives and false negatives — give a clear indication of search engine performance. You can make a change to the algorithm and see if the numbers go up or down, and by how much. Recommendation engines are another type of filtering system, and in practice each optimizes for some numerical value. A retailer like Amazon might measure how much each customer ends up buying, while Netflix is more concerned with choosing movies that you’re going to like (as measured by the ratings you assign). News personalization systems often use the number of clicks on your customized headlines as a proxy for success.

These sorts of measures are crucial for engineering work, but of course they miss much. Search engine performance metrics only work if you, the human, already have a clear idea of what counts as “relevant” for a given query. Dollars and clicks are useful metrics, but they squeeze rich social interactions into a limited economic mindset.

Humanists tend to think about things like whether the user “needs” the information, or whether it might challenge them, inspire them, or teach them something new. Or they might be concerned about filter bubbles, the idea that personalized information filters will end up telling us only what we want to hear, fragmenting us into smaller and smaller factions that never really talk to one another. Other people wonder about serendipity in online information systems, imagining ways to help us discover things we might never have looked for. All of these people ask difficult questions about how our media shape our societies, and what’s good or bad about that.

But how can we actually know if any particular system is producing “filter bubbles” or not? And given the choice between two algorithms that might be quite similar, how should we pick one over the other? Descriptions of humanistic qualities are usually far too vague to be turned into code. Public Insight Network co-founder Andrew Haeg has suggested that a good filtering algorithm should align with Maslow’s hierarchy of human needs. Sounds great, except that I have no idea how to translate that into an algorithm, or how I would test my code against the real world to see if it satisfied those needs. As sociologist Stuart Hall famously put it:

News values are one of the most opaque structures of meaning in modern society. All “true journalists” are supposed to possess it; few can or are willing to identify and define it. Journalists speak of “the news” as if events selected themselves…Yet of the millions of events which occur every day in the world, only a tiny portion ever become visible as “potential news stories.”

Technologists often have a refreshing pragmatism here: Facebook ran a controlled experiment with 250 million users to see what effect removing something from your news feed has on whether you or not you eventually see it. Google continually tests search algorithm changes by asking thousands of users whether they like the results in Column A or Column B better. But this kind of engineering is no substitute for envisioning truly new ways of organizing information; decades ago, Horst Rittel and Melvin Webber explained why “optimization” will not solve societal problems, inventing the term “wicked problem” in the process.

Conversely, humanists are often unwilling or unable to grapple with the dirty technical details of how algorithms are built, and what is actually possible. It took a computer programmer, Seth Finkelstein, to explain how algorithmic constraints create the sociological effects of Google’s web search. Nice work, and there’s an argument to be made that anyone contemplating filtering algorithms should understand the engineering involved. But we’re also going to need simple ways to explain all of this to non-specialists. Many people accused Twitter of censorship when #occupywallstreet failed to make the trending topics list despite simultaneous protests in dozens of cities. In fact there was no censorship, just a quirk of the trend-finding algorithm — an algorithm that “claims to know the mind of the public.”

Moreover, it’s not just about code. All filtering algorithms operate on human input, and some are much more reliant on humans than others. Web search engines look at the links between pages, links that were put there by people. Collaborative filtering algorithms based on “likes” or voting (à la Reddit, Digg, Slashdot, etc.) solicit direct feedback. The result of any filtering algorithm depends on a complex interaction between the code that implements it and the people who use it. The same algorithm in different social contexts, or administered by different people, can produce wildly different results. We have to look at culture and code together.

So filtering algorithm design is one of those wildly interdisciplinary problems. The challenge is to imagine systems that:

  • forward societal goals that we think are important, yet are precise enough to be phrased as performance yardsticks,
  • combine algorithms with humans in a productive way, and
  • can actually be built with available technology.

That’s very hard. It requires a rare type of cross-domain thinking, because we don’t yet really know how to combine the pragmatic demands of technology with the social aspirations of the humanities. But it’s also an exciting time to be working in digital journalism, where these two cultures meet every day.

Algorithmic image by Anders Hoff used under a Creative Commons license.

POSTED     June 18, 2012, 10:45 a.m.
SEE MORE ON Aggregation & Discovery
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Snapchat’s new Discover feature could be a significant moment in the evolution of mobile news
By putting mobile-native news adjacent to messages from friends, Snapchat could be helping create part of the low-friction news experience many want and need.
Here’s how the BBC, disrupted by technology and new habits, is thinking about its future
The British broadcaster released a new report looking at the future of news as it looks toward its royal charter renewal in 2017.
At Datalore, data plus storytelling means empathy, humor, and games
At the MIT Media Lab, teams of designers, developers and storytellers pulled stories from eight different data sets.
What to read next
2588
tweets
Don’t try too hard to please Twitter — and other lessons from The New York Times’ social media desk
The team that runs the Times’ Twitter accounts looked back on what they learned — what worked, what didn’t — from running @NYTimes in 2014.
728From explainers to sounds that make you go “Whoa!”: The 4 types of audio that people share
How can public radio make audio that breaks big on social media? A NPR experiment identified what makes a piece of audio go viral.
705Q&A: Amy O’Leary on eight years of navigating digital culture change at The New York Times
“In 2007, as digital people, we were expected to be 100 percent deferent to all traditional processes. We weren’t to bother reporters or encourage them to operate differently at all, because what they were doing was the very core of our journalism.”
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Baristanet
Lens
Neighborlogs
Upworthy
PBS
Associated Press
Examiner.com
I-News
Hechinger Report
NPR
The Chronicle of Higher Education
Reuters