Nieman Foundation at Harvard
After criticism over “viewpoint diversity,” NPR adds new layers of editorial oversight
ABOUT                    SUBSCRIBE
July 8, 2014, 1:56 p.m.
Reporting & Production

Q&A: Tarleton Gillespie says algorithms may be new, but editorial calculations aren’t

“We’re always navigating information and culture by way of these mechanisms, and every mechanism has a built in notion of what it’s trying to accomplish.”

Should Facebook be allowed to decide what information we do or don’t see? Should Google be responsible for ensuring that their search results don’t offend or incriminate? If we allow platforms to determine what content and information we encounter, are we defaulting on our civic responsibilities?

Tarleton GillespieLately, it seems questions like these — questions about the algorithms that govern and structure our information networks — are raised more and more frequently. Just last week, people were outraged when it was discovered that Facebook had tried to study the spread of emotion by altering what type of posts 600,000 users saw. But the reality is we know less and less about how news content makes its way to us — especially as control of those information flows becomes more solidified in the hands of technology companies with little incentive to explain their strategies around content.

Tarleton Gillespie has done a considerable amount of writing on what we know about these algorithms — and what we think we know about them. Gillespie is an associate profesor of information science at Cornell, currently spending time at the Microsoft Research Center in Cambridge as a visiting researcher. We first met at the MIT Media Lab, where Gillespie gave a talk on “Algorithms, and the Production of Calculated Publics.” His writing on the subject includes a paper titled The Relevance of Algorithms and an essay called “Can an algorithm be wrong?” More recently, he contributed to Culture Digitally’s #digitalkeywords project with a piece on algorithms that explains, among other things, how it can be misleading to generalize the term.

We touched on the conflict between publishers and Facebook, Twitter trends, the personalization backlash, yanking the levers of Reddit’s parameters, and how information ecosystems have always required informed decision making, algorithms or no. Here’s a lightly edited transcript of our conversation.

Caroline O’Donovan: What is the #digitalkeywords project, and why did you think “algorithm” was something that was important to define?

Tarleton Gillespie: The #digitalkeywords project is a project Ben Peters is organizing. The short of it is that it’s inspired by Raymond Williams’ “Keywords” collection from 1976. He wanted to get scholars to think about the terminology that matters now, and would be important both to scholarship around new media, and also to broader public audiences.

For me, I’ve been finding myself thinking about the term algorithm quite a bit in the last couple years. In part, it came from a research project that is my main project, which is thinking about how social media platforms govern speech that they find problematic. That could be classic categories like sex and violence, it could be hate speech, it could be an array of things. But the question is, how are they finding themselves in charge of this problem of removing things that are unacceptable — finding themselves as cultural gatekeepers as publishers and broadcasters have had to be — and how do they go about doing it? And how do they justify those techniques?

As I was trying to figure that out, I was thinking about a whole array of things — like putting things behind age barriers, or safe search mechanisms, or blocking by country — and I was calling them algorithmic techniques. That made me think about updating a long literature about how technology matters, how you can design things so as to govern use — but how do you do that in an algorithmic sense, rather than, say, in a mechanical sense?

O’Donovan: You mention the way that social media platforms have, in a way that they maybe didn’t even expect to, taken over a role that used to be strictly in the domain of publishers. One thing that stuck out to me in some of what you’ve written is the question: Is having to make arbitrary decisions about that seem objective an entirely new problem? You argue that it’s not a new problem, it’s just a new way of dealing with it — I think you call it a new information logic.

Gillespie: It’s in some ways a very old problem. NBC has to decide what’s acceptable at 8 p.m. And they do that within some guidance of what the FCC says, but mostly they’re working within those barriers, and deciding what they think their audience will accept, what they think their moral compass is, what their advertisers will blanch at.

Now you’ve got Facebook and Apple and YouTube being in a similar situation, but I think the game is different. You can do classic stuff, like setting rules and deciding where the line is, but the other thing you can do now is you can design the platform so that you manage where those things go. It’s not like there’s no precedent for that either — you can build a video rental store and put all the porn in the back room, and have rules about how gets in there and who doesn’t. That’s using the material architecture and the rules to keep the right stuff in front of the right people and vice versa. But you can do that with such sophistication when you think about how to design a social media platform. You know much more about people, you know much more about what their preferences are, about where they’re coming from, about the content. You can use the algorithm — the very same algorithm that’s designed to deliver what you searched for — to also keep way what they think you don’t want to see.

O’Donovan: That’s an interesting point, too — what they “think” you don’t want to see. You’ve written that the way Google knows if their search algorithm is working right is if the first five results have a lot of clickthrough. But when it comes to a more editorial judgment, how do you measure satisfaction? And is satisfaction even what we really want news consumers to experience?

Gillespie: That gets to this bigger question: What are algorithms a stand in for? We hear all the time about how powerful Google’s algorithm is. It sounds very precise, it sounds very mathematical. But in reality, algorithms can be trained to look for a lot of things, and they are trained on a set of user practices, they’re trained on a set of criteria that are decided by the platform operators, and what counts as relevant is just as vague as what counts as newsworthy. It’s not a meaningless word, but it’s a word that has a lot of room for interpretation, and a lot of room to have certain kinds of assumptions and categories built in but be relatively invisible.

O’Donovan: What do you feel like the future of personalized news content is? Did we maybe go a little too bold at first with our belief that that was possible, or that it would be satisfying, or that it was something that people or publishers really wanted?

Gillespie: I would love to think that the pendulum is going to swing back. A lot of this is driven by what a particular platform thinks it can do. There’s a lot of push toward trying to predict what a user might want, well outside of journalism. Search results and advertising and news fell into that.

Journalism is the place where it got pushed back the most, because we have a public interest in not just encountering what we expect to see, that’s much stronger than the same feeling about advertising, right? I don’t like predicting the future, because I’m terrible at it, but the enthusiasm about personalization has shifted a bit.

In some ways, I feel like what we’re seeing now is throwing every possible kind of slice at us. Any slice — here’s what we think you want, here’s what we think you asked for, here’s what we think about what everyone else is doing, here’s what your circle of friends is doing, here’s what’s timely, here’s what’s editorially interesting, here’s what’s connected to our advertising — that in some ways giving us many slices through the available content, personalization just being one or maybe a couple of those.

I don’t think it’s going to go away. I think it remains one of the ways in which a platform can slice up what it has and try to offer it up. I do think the gloss has gone off it a little bit, and probably for a good reason.

O’Donovan: You write about the feeling of being satisfied by personalization. For me, occasionally a Pandora station I make will deliver what I didn’t know I wanted. But there’s a tension between that feeling of satisfaction when an algorithm performs the way you wanted it to, and this other feeling that we don’t want to be quantified — we don’t like the idea of this cold, objective robot making these decisions for us. How do you think about which of those two reaction wins out?

Gillespie: I think in some ways I worry that those two sides miss the real third way. We do have that experience where an algorithm seems to work really well. Satisfaction is one way to understand that, that feeling of accomplishment — it was exactly what I needed, it was quick, it was effective. And the opposite, what you were saying about the kind of coldness of it.

I think what that hides is the way that, for a long time, we have navigated information in part through mechanisms that don’t belong to us that try very had to provide something for us. They weren’t always calculational: Music reviewers are an intensely important way that we decide to encounter things that’s both appealing to us and can work really well. When someone suggests something that we never would have heard, it’s completely moving to us, and exciting.

And we can be frustrated by it: These people are cultural gatekeepers. Are they attuned to what we really care about? Are they culturally biased? Elitist? We struggle with that. Similarly, when we deal with the quantified versions — the pop charts — is that an amazing glimpse of what people really are interested in? Or is it a weird artifact of what weird middle ground material can make it above all the more interesting stuff?

We’re always navigating information and culture by way of these mechanisms, and every mechanism has a built in notion of what it’s trying to accomplish. That’s the part we need to unpack. There’s always going to be a tool that says if you’re interested in this, listen to this. But the assumptions that tool makes about what it should look for, what it is we seek, and what’s important about that form of culture — whether it’s journalism or music or whatever — that’s the part we have to unpack.

O’Donovan: Do you think that some of that confusion, or obfuscation, could be reduced by these companies labeling things more clearly? We think Trending is a specific thing, but maybe Twitter could use a different word that is more specific, or accurate. We think we know what a News Feed is, but it doesn’t look or feel like we think the “news” should be. I’m not necessarily talking about more transparency in terms of what actually goes in to the rules — but if they did a better job at explaining what it is they’re trying to do, besides just saying, We’re serving the content that you like, or, We’re serving the content that’s popular right now.

Gillespie: I think that would be really terrific. There are obviously two obstacles for them in being much clearer than they are. One is, obviously, these aren’t just services — they are commodities and advertisements for themselves, so they have to be catchy. The bigger one is not wanting to reveal too much of their secret sauce. They feel like these algorithms need to be guarded. They don’t want them to be gamed by search engine optimization or trend optimization or whoever it would be. They don’t want competitors to be able to just lift that and build their replacement.

But it seems like there would be room for a kind of explanatory clarity that’s not the same as giving away exactly how the algorithm works in specific terms, but that honors the fact that these different glimpses are different, and they differ in kind, not just sort of on the surface. Twitter has been relatively forthcoming about what Trends is, maybe as best as it could be.

But I like a model like Reddit. When you go to the top of Reddit, there’s about five choices of how to organize — there’s trending, there’s hot, there’s controversial — and you can read about what those things are. It’s not that that’s the perfect way to do it, but at least the fact that there are different slices reminds you that these slices are potentially different. I think that they could have a little justification about how they thought about it.

One of the things I found really interesting about Twitter Trends is that they’ll weight tweets or hashtags that appear across different clusters of people that aren’t connected to each other on Twitter higher than a lot of activity that happens in a densely connected cluster of people.

You can imagine the opposite choice, where something that happens in a cluster of people really intensely, but isn’t escaping — maybe that’s exactly what should be revealed. Something like: You may not know anything about this, but somewhere, there’s a lot of discussion about this, and you may want to know what that is. That’s fulfills a very different public or journalistic thing. Yes, there are things that seem to be talked about on a wide basis, and we want to reflect those back, but we also want to say, over here in the world, in a place you don’t have access to, there’s something going on.

Even if you just had those two things next to each other and talked a bit about how they’re different, you’d offer users a way to think about their difference and make choices. And it would push the platform to think about why the difference might matter.

O’Donovan: It’s interesting to think that a Facebook gets you to buy in for the idea of a social network, but now is their main product that algorithm? And if it is, which of the platforms is going to be the first to offer, like you’re saying, these different slices — a series of different algorithms? Instead of just offering one as a product, what they can boast is: We have fifteen different ways to search an unlimited corpus of data.

Gillespie: I would think that Twitter would be in a more likely position than Facebook, only because in their sense of themselves they seem to have an emphasis on the public role they play — the democratic role they play. It’s not surprising that Reddit is a little farther ahead, because it’s much more fascinated with the technical element of this, and it wears the technical element on its sleeve.

Now, it’s not clear that Trending and Controversial and Hot and New are the right four slices, or do the work that I’m hoping it would do to reveal how these things are different.

In some ways, the other way you do this — and it starts to sound like personalization, but I don’t mean it this way — is to let the user play with the parameters of those differences. I don’t mean, Boy, I hope I could set it so I can get all domestic and no international — that’s the worst problem of personalization. But, show me Hot and show me Trending and show me Controversial, and then let me pull the levers and change the parameters a little bit, and see what that does to the ranking. That recognition that even one algorithm’s criteria shift based on the parameters, seeing that happen — not just knowing, intellectually, but seeing it happen — would be a pretty interesting glimpse into how much the choices are built into the apparatus.

O’Donovan: Talk about data literacy! The idea of being able to one day pull all those levers yourself — that any person’s understanding of how they get information involves tweaking all these things — that’s an interesting view of the future.

Gillespie: Yeah, well, to me it’s also a glimpse of the past. If we think about traditional media, understanding the levers that were at work there, about the choices that people made to decide what was on the front page of the paper, or what was going to be broadcast this season — were those criteria being driven by certain kinds of sources? What were the assumptions being made there? Those things were very far from where a regular viewer or reader could even access. And they were only glimpse-able in the moments either when there was a crisis of confidence — a newspaper blew it and had a big embarrassment and you could see the inner workings — or when you had a clever sociologist who could get in there and talk about how it works. Now, at least, you can see the parameters could be more visible, if that were a feature that could be provided.

O’Donovan: I really liked that point — that in some ways an editorial meeting is more clouded for the typical reader than an algorithm is, because there’s more chance for fallibility, so there’s more incentive to make it hard for someone to understand how those decisions are made.

Gillespie: There’s definitely a lot of obscurity to be thought about. I wouldn’t want to paint it where the editorial decisions of a newspaper are totally obscured and the algorithm could be totally clear. I think the researchers that have written about news values and the calculations of editors that were about newsworthiness and timeliness and what an audience cares about — that was a kind of algorithm. There were certain things you would look at differently, and it would produce different outcomes. And you can only make that so visible, right?

O’Donovan: What’s interesting to me is that The New York Times maintains its great reputation even though it’s being beat out for traffic — there was a part of the innovation report that said sites like The Huffington Post are just repackaging  Times content and getting more traffic from it than they are.

The Times gets to say, We do great journalism. Then, on the other side of the coin you have a BuzzFeed, which is about as close as you can come to gaming a social algorithm, I think, but their reputation is bad in a lot of circles. Still, they’re reaching so many more people that way. I don’t know if there’s a way to have both, but it seems like there should be a middle ground.

Gillespie: I don’t know how you achieve that middle ground. I think we’re in a place where it’s awfully easy to recirculate stories that are produced.

It reminds us what an incredible mechanism it was to say, We’re going to be a newspaper that not only reports the story, writes the story, checks the story, produces the story, but then also manages turning it into paper, delivering it to street corners, having people sell it, managing subscriptions — that’s an incredible apparatus. We got used to that as a 20th-century arrangement, whether it was in newspapers or film or in television. But now that whole second half of — “We will also manage the circulation of this content” — is fractured enough that it’s just much harder to put the financial and emotional investment in the first half.

O’Donovan: I’m thinking of this point you make about how algorithms can change very rapidly — in fact, they’re always changing, they’re always learning, and they’re never exactly the same thing to any two people, and they’re never the same thing one day after the other.

Then you have BuzzFeed, and what they’re doing with their data input and analytics, which as I said is about as close to gaming a social algorithm as you can get. How is BuzzFeed doing that? And what happens when there’s this mirrored back and forth?

Let’s say, for example, Facebook decides they want to downplay clickbait headlines. Theoretically, according to what BuzzFeed says about itself, they’re going to notice that. They’re going to notice that that trick is no longer working, and they’re going to come up with a new trick, and then the algorithm would have to change in reaction to that. Is that a logical characterization of that feedback loop? And is there any way to change it?

Gillespie: I think it is, as best as I know. In some ways, BuzzFeed is a creature of the kind of algorithmic delivery of information. It’s not so far from search engine optimization. It’s put a lot of investment into watching the circulation of its stories, trying to figure out what gets circulated, and then tweaking it. If Facebook changes its algorithm, it hopes it will discern that and come up with a different theory.

The other way to think of it is, they’ve got two forces to factor in. They’ve got to figure out Facebook’s algorithm, but they’ve also got to figure out the audience. For their stuff to drop off — let’s say they see a lag in the previous month — is that because Facebook tweaked their algorithm? Because people were less interested? Is that because they didn’t have as many interesting stories? Because no celebrities did anything embarrassing that month? It’s very hard to discern this, and that’s something cultural producers have had to do for a long time. Why didn’t people come to this movie? Was it that it was terrible? Was it that word of mouth was bad? Was it a bad weekend? Is it the mechanism by which the movie gets to the people, or was it the content? There’s a thin line between gaming the algorithm and trying to be appealing.

The funny thing about clickbait as an idea is it’s basically shorthand for: People really wanted to read this. Writing a really juicy headline to get people to read it, whether you got the substance or not, is not new to BuzzFeed and Upworthy. Is that gaming the algorithm? Was the algorithm of the penny newspaper — “You can see the front page on the shelf, and you can’t see the content in it, so those words better be big and gripping and delicious”? Is that gaming the algorithm for how newspapers were sold? Or is that just trying to get people to read your paper?

The last part of this is, as BuzzFeed has shown, if you’re beholden to the algorithm, what you do is not just sit there and try to guess the algorithm — you go and you meet with Facebook, and you strike a deal. That’s the real story — who’s going to get to strike the deal with providers, such that their stuff continues to stay on the network, or continues to be privileged.

O’Donovan: I don’t know if BuzzFeed would admit that that’s exactly what the nature of their deal with Facebook is.

Gillespie: Right, but they’re not stupid. Looking to figure out how to stabilize that relationship is a lot smarter than trying to ride it, and hope that you understand the workings underneath.

O’Donovan: Publishers are frustrated because they feel like they’re not reaching the number of people that they used to on a platform like Facebook. They say: The algorithm isn’t meant to serve us, the algorithm is bad, the algorithm doesn’t respect good journalism.

But then a guy who works for Facebook ad product had a blog post about the state of media saying all media is garbage these days and why don’t we have good journalism anymore. Then there’s Alexis Madrigal of The Atlantic saying, among many others, what do you mean? We’re doing the best we can here, but we can barely get any play on your platform as it is, and if we just did serious investigative journalism, no one would ever read it, and it’s your fault.

How did we end up in a situation like that, and what can we do about it?

Gillespie: I find myself regularly wanting to go back to: These are not new difficulties. We had this big shift in television news where for a long time, in the Cronkite and Murrow age, you couldn’t be a television network without having a a flagship news program that had gravitas and journalistic traditions and all that. We had what some people might point to as the Golden Age of Television News. And then at some point, a number of networks under various kinds of economic pressures said, We can’t afford a loss leader anymore. That was one of those moments where the call on one side — We desperately need to have good journalism! — runs up against a distributor, a network in the traditional sense, that says, We have other competing priorities.

Now, it plays on slightly different lines. It’s not: We have to have a slate of programming that will draw big audience. It’s: We have a platform that calculates what people do and then responds to that. That means they can’t point to things differently. Instead of saying, we’ve got to make a buck at the end of the day — which of course they do — they can say, Look, it’s a user-driven mega-community.

Now, that’s a misrepresentation, I think, of the decisions that go into what the algorithm displays in the first place. But it’s not so different — the entity that helps deliver the news is not the same as the news, and their interests and their understanding of what they should be doing, and their commercial pressures, and how they came to do what they do, is not the same as having come into a kind of journalistic project from a journalistic standpoint.

Now, a solution? That’s harder. Do you call on these networks, on Facebook — and this is what Alexis Madrigal and Upworthy are doing — and say, You’ve got to look closer at the choices you’re making about your algorithm, because you are in fact putting us at a deficit and you shouldn’t? And when you say shouldn’t, shouldn’t according to what? According to some public obligation? It’s not clear that we expect Facebook to be a public service, even in the way that we expected NBC to be one.

O’Donovan: What if we had a news service where you can request a bespoke algorithm? You can tell it what you like to read, what you think you like to read, and it can watch your behavior, and you can tell it how you want to better yourself — you allow it to decide what the parameters of better or smarter are.

Gillespie: We’ve traditionally asked people what they want, and then sometimes given them what we think they should have, but we haven’t really said, if you think you should have something, what is it and can we help give it to you. That’s kind of cool, I like that.

O’Donovan: There’s always a problem of — does length make something serious? Well, no, not really. Does a heavily titled byline make something smarter? Well, no. At the end of the day, we want to put our faith in someone else’s judgment — and aren’t we then just back at homepages?

Gillespie: I like the idea because it puts the algorithm in the service of both the public interest and the user, and tries to bring those interests onto the same page. It does bring up the issue of it’s very hard to know something about content except what computers know well — things like length and source and date and keywords.

But that last point is exactly right — we are in an information environment, and we always have been, where the best possibility of us being informed and thoughtful and ready to be participants in a democracy has always depended on other people. It’s always depended on other people to be closest to the information we need, which makes them risky, because they’re biased or subjective or emotional, but more importantly because they have to be participants in an institution that has to sustain itself, whether it’s a newspaper organization or a TV network or a social media platform. That raises all sorts of problems too, about why they’re really bringing the information to us that they are. I don’t think we can really get away from that.

So the only thing we can do is to continue to demand that these services provide what we think we need, the “we” being both individual and collective, and keep paying attention to the way they have structural problems in doing so. Whether that’s algorithm or editorial acumen or yellow journalism, these are just the kinds of problems that emerge when institutions try to produce information on a public and commercial basis across a technical platform. We’re just facing the newest version of that.

Image of an algorithm by Manu Escalante used under a Creative Commons license.

POSTED     July 8, 2014, 1:56 p.m.
SEE MORE ON Reporting & Production
Show tags
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
After criticism over “viewpoint diversity,” NPR adds new layers of editorial oversight
“We will all have to adjust to a new workflow. If it is a bottleneck, it will be a failure.”
“Impossible to approach the reporting the way I normally would”: How Rachel Aviv wrote that New Yorker story on Lucy Letby
“So much of the media coverage — and the trial itself — started at the point at which we’ve determined that [Lucy] Letby is an evil murderer; all her texts, notes, and movements are then viewed through that lens.”
Increasingly stress-inducing subject lines helped The Intercept surpass its fundraising goal
“We feel like we really owe it to our readers to be honest about the stakes and to let them know that we truly cannot do this work without them.”