Nieman Foundation at Harvard
HOME
          
LATEST STORY
From shrimp Jesus to fake self-portraits, AI-generated images have become the latest form of social media spam
ABOUT                    SUBSCRIBE
April 29, 2014, 12:19 p.m.

Q&A: Clark Medal winner Matthew Gentzkow says the Internet hasn’t changed news as much as we think

“It remains true that the fixed costs of producing good news are still really high. It’s easy to put up a website, but to produce original reporting news content is still really expensive.”

The Clark Medal is one of the most prestigious awards in all of academia, awarded to the “American economist under the age of forty who is judged to have made the most significant contribution to economic thought and knowledge.” (Names you might know among previous winners: Paul Krugman, Milton Friedman, Joseph Stiglitz, Steven Levitt, and Larry Summers.) This year’s honor went to Matthew Gentzkow of the University of Chicago’s Booth School of Business. Gentzkow is a pioneer in the field of media economics; his work, often co-authored with Chicago Booth’s Jesse Shapiro, takes advantage of previously unavailable data on audience, content, and media impact. Austan Goolsbee, also a Chicago Booth professor, commented on Gentzkow’s work in The New York Times:

“Before the Internet and advances in computing power, this couldn’t be done,” Mr. Goolsbee said. “You couldn’t analyze the data and you wouldn’t have had the ambition to try.”

Some of Gentzkow’s most talked-about research has been on bias in news sources — he’s written papers around measuring slant, whether readers consume diverse or confirmatory news, and whether there is a demand for biased news in the market. He’s looked at the impacts of television on children and on voting behavior, and he’s has studied online advertising.

Going forward, Gentzkow said he’s interested in looking at more international media — he’s focused on finding a comprehensive data set for global media content. He’s also excited about the potential for data created by geocoding and cellphones, as well as studying media impact on the individual level — maybe even with electrodes. We talked about the cost of information gathering, the demand for quality news, and the obstacles to gathering data; here’s our lightly edited conversation.

Caroline O’Donovan: Congratulations! I think I read that you now have a one-in-three shot of winning a Nobel. My question is: Can you build a predictive model that tells us what year you’re going to win the Nobel?

Matthew Gentzkow: I think I should refrain from speculating on that. The scary implication of this kind of thing is you don’t want to be remembered as the one guy who won this prize and then didn’t do anything very interesting afterward. One might think, if you’re lucky enough to win an award like this, then you can kick back and relax. But it doesn’t really feel like that. It feels like now I have a lot of work to do to try and live up to this vote of confidence from my colleagues.

O’Donovan: This is one of those wunderkind awards that specifically exists to make you feel like you have a lot of work left to do.

Gentzkow: I don’t know if 38 years old still counts as a kind, but I’m happy if it does. I think there is some notion in awards like this of recognizing people while they’re still working, as opposed to once it’s all done.

O’Donovan: But joking aside, the whole idea of all this new, deeper data being available — that’s not going away, right? So there’s certainly a lot left for you to get into.

Gentzkow: Oh, absolutely. It’s an incredibly exciting time to be involved in economics — to be involved in science broadly. There’s more and more data everyday. I think what everybody will be able to do 10 years from now will make this year look kind of puny.

The challenge is trying to keep up, keep close enough to the frontier, keep learning new things, keep up with all these smart graduate students who are getting their PhDs and know a lot more than I do. Try to keep producing new research. It’s challenging, but certainly the data and the technology are going to keep getting better, and that makes it exciting.

O’Donovan: To dial it back a little bit, what made you decide that media economics was something you were interested in? I assume it wasn’t, The data around this issue is going to explode, and I want to be the guy that was known for taking advantage of that. So how did you get interested? What were the big questions that were driving you?

Gentzkow: It was certainly not as far-sighted as that. The immediate thing was: I’m a graduate student, I need to find a topic for a dissertation so I can get my degree and get a job. So for me, like a lot of people, it came out of this process of casting around, looking for topics, talking to your advisor. Once I stumbled on it, it was a really good fit. There was a mix of interesting, rich economics that it seemed like other economists might find interesting, but also this broader set of political and social questions. Media is in some sense a market like any other market, but it’s interesting above and beyond the usual reasons because of the way it effects the political process. It’s something that the typical American spends three or four hours a day doing.

I never worked in business — I didn’t do consulting or investment banking. Some of the things people traditionally work on, I didn’t have exposure to. Newspapers and TV and the Internet were things I felt like, as a consumer, I had some intuition about, thought about, found myself asking questions about. It was a good fit for me to work on something that had already piqued my curiosity.

O’Donovan: Are there areas of it that you feel especially excited about, getting the answers to some of those questions?

Gentzkow: Things that I would love to work on and other people are working on — one is ongoing changes in online media. So things like social media and how that’s changed the landscape of where people get information and how. The way the business of media has changed online continues to be a really challenging and exciting question. Understanding online advertising markets, and how they work, this big question in the background — has the business of journalism changed in a way that we’re not going to be able to support? How is that going to play out at local levels? National levels?

And a third thing is how similar sorts of things play out in different countries around the world. Whether the U.S. media is a little bit conservative or a little bit liberal, that’s sort of important. But what’s happening in Russia, in China, in the Middle East, what happened in the Soviet Union, in communist countries — in those sorts of settings, there’s an order of magnitude bigger impact in some ways.

O’Donovan: Can you walk me through the difference in availability for those micro U.S. questions versus the more international questions? Where would that information come from? How do you get it? What are the difficulties or challenges to getting it?

Gentzkow: So, if you want to look at news text currently, say in the last year across lots of different countries, that is already easily available. Google News has sites for lots of different countries. Part of what’s really exciting is, it’s sitting right there.

Now, doing that in practice is a little harder. Jesse [Shapiro] and I several years ago had a project where we were trying to aggregate news content from lots of different countries, partly with some help from Google News, and the computational challenges, the challenges of getting everything into a form where it was clean enough that you could do something with it, proved to be pretty hard. We ended up putting that project on the back burner because we couldn’t quite get it all to come together.

O’Donovan: Google is cooperative with that kind of research?

Gentzkow: They tend to be cooperative. Google has a history of being very cooperative with researchers, at least to the extent that it doesn’t impose some huge cost or burden on them. They were very helpful about letting us access the database from Google News of the news stories they had archived each day, so we could go out and scrape the text of those things. That was really due to one engineer there who used some of his free time to set that up and do it. So I’ve found them to be extremely helpful. Obviously, it’s a business, so they’re going to be more reluctant to do things that require huge costs on their part.

O’Donovan: So you could scrape everything on a day and keep it all?

Gentzkow: We could scrape it and keep it. There’s some sensitive copyright issues around them giving us directly the archive of text from all of those sites; they were giving us the URLs and we were going out and storing the HTML text from those URLs ourselves. Again, this is an example of a project we never actually figured out how to do well enough to write a paper about it.

Somebody just showed me a website1 which is not primarily academic, where they actually have a very large number of sites around the world. They’re scraping them and categorizing them and backing out from them; automated measures of what events are happening — where, when, mapping them. It all sounded very exciting.

O’Donovan: When you’re thinking about chunking media types — you have some studies that are about newspaper content, and then some about broadcast and television, and then digital — how do you think about breaking those things down and making them comparable, if they can be comparable at all?

Gentzkow: Well, video is really hard. Obviously, automated content of video is something we’re still not very good at. Google’s working hard on that problem, so you can search for things, but that’s beyond my abilities.

But in terms of text, I think in the digital space, pretty much everybody’s competing with everybody, so it makes sense to think of that as one market, whether it’s ABC.com or NYTimes.com or NPR.org. Whatever traditional media you’re coming from, once you’re putting content online, you’re competing in the same marketplace. Newspapers in the 19th century, TV in the 1950s, daily print newspapers in the U.S. in the mid-2000s — that’s something different.

There is a theme running through this work: that the differences across media (in the sense of medium) are smaller than people often imagine. A lot of the underlying economics is the same online as it was for print newspapers and TV, and as it was in the 19th century. I think that’s part of the lesson that comes out of all of this — that maybe, things don’t change quite as much as we think.

O’Donovan: Can you give an example, or expand that a little bit?

Gentzkow: One of the projects Jesse Shapiro and I worked on, the study of ideological segregation online — the motivation for that paper was, there’s been a lot of discussion about the idea that because there’s so much variety available online, it’s going to allow people to self-segregate. Conservatives only look at conservative stuff and liberals only look at liberal stuff and neo-Nazis only look at neo-Nazi stuff and vegetarians only look at vegetarian stuff. Nobody gets any information that contradicts them.

The purpose of the paper was very simple: Let’s go look at some data on the way people actually consume news online and see to what extent that’s true. Conclusion: not nearly as much as you might think.

If you ask why not, the answer is because the Internet is not all that different from any other medium. The key thing driving low segregation online is that most people get most of their news from a very small number of sites. They get their news from CNN.com or Yahoo.com, NYTimes.com, Fox News — a huge share of news consumption is a small number of big sites that are very much in the middle of the spectrum in terms of their audiences.

Why is that true? Why haven’t we instead seen something a little more like the scenario Cass Sunstein was talking about, where everybody reads their own niche site and there are thousands of different niches and each person is in one of them? Because it remains true that the fixed costs of producing good news are still really high. It’s easy to put up a website, but to produce original reporting news content is still really expensive. Creating a website like CNN.com that covers everything that’s going on and that people trust and believe in is hard, is expensive.

So you end up with, just like in lots of other media markets, a small number of firms control a large share of the market. Those firms that invest all that money in quality are not going to do that and then cater to the neo-Nazi vegetarian tiny little corner of the market. They’re going to position themselves in the center to appeal to a wide audience.

The economics that drove the finding in that paper, I think, are the same economics that explain why we see what we do in TV and why we see what we do in print newspapers. The details are different, the cost structure is different, but basically the production of news remains not actually all that different. That shapes in a big way the outcomes that we see.

O’Donovan: That’s interesting, because that puts the reporting at the center of the cost to the news company, which I don’t think we talk about much.

Gentzkow: News companies are doing a few different things that are distinct. One is producing information — that is, reporters going out, collecting information, writing stories. A second is filtering and interpreting it — picking which one of the 25 stories we’re going to put on the front page. And a third is delivering it to people, physically, through the wires into their TV or throwing it on their doorstep with the print newspaper.

The Internet has dramatically changed the technology for delivering information to people, and it’s also pretty dramatically changed the extent of competition and filtering and interpreting information. But it really hasn’t changed all that much the production of news. If we want to learn what’s happening in Afghanistan, pretty much somebody has to go to Afghanistan and put their lives at risk and take photographs and interview people.

Those things have changed some — yes, there’s crowdsourcing, people upload videos from their phone. And yes, lazy reporters can just sit at home and do research on Google, where they don’t actually have to go down and sit in a city council meeting. But I would say relative to changes in other parts of the business, the way reporting has changed is much smaller. I think producing good news stories remains something that’s very costly, requires a lot of skill, a lot of talent. That remains the scarce resource. That explains why the Internet isn’t quite so different as we might have thought.

O’Donovan: I guess the fear there that you sometimes hear is that, if there’s a slow attrition of quality over time, then the reader no longer expects the same consistent level of quality news, then that might actually be disrupted.

Gentzkow: Like I said, I think to what extent the ability of the market to support high quality journalism has changed, or is changing, or will change, is a really important question, and not one I know the answer to.

I am more optimistic than some people about it. I don’t really buy the view that we train consumers to care about quality or not care about quality. I think the desire of people to know what is going on in the world from a source that they trust, that they believe is accurate, is a feature of humanity that’s been there for a long time. People in the Roman Empire cared a lot about getting the news, people in Medieval Europe cared a lot about getting the news, people in the 1920s cared a lot about getting the news, people today care a lot about getting the news.

O’Donovan: How does what you’re talking about — the demand for quality news — fit into the work you and Jesse have done on the consumer demand for biased news?

Gentzkow: There’s a really important clarification with that paper. The media slant that we’re measuring in that paper has no notion of good or bad attached to it. We are measuring based on the phrases that newspapers use, based on their content, which newspapers are to the right or left of which other newspapers. There is no notion in that paper of more or less slant, or more or less bias. All we can do is line people up from left to right.

What we’re picking up are decisions like: We have to call these people either undocumented workers or illegal aliens. Both of those terms are loaded, both have strong political connotations, we have to pick one or the other. People might debate this, but in my view there’s no such thing as the objective, correct term. Which decision you make will put you either to the left or the right, but it doesn’t make you better or worse or more or less accurate.

Saying that newspaper slant is driven by the readers doesn’t mean that catering to the readers is making newspapers worse or more biased or less accurate or lower quality. It just says: These differences we see, that some sound way to the left and some sound way to the right, are shaped by making the decisions that will appeal most to those readers.

There’s a separate question that we don’t take up in that paper which is: How does catering to readers affect quality? For example, maybe really all that people want to read about is celebrity gossip and scandals and local crime, and media end up covering those things to the exclusion of political debates or something that you think might have valuable social effect. Does catering to consumers make media more lowbrow, highbrow?

I think local crime is actually pretty important; political scandals are an important part of politics. Judging what news content is good for society and what news content is bad for society is a little bit of a tricky business. But I think it’s still a really interesting question.

O’Donovan: I was going to say, how do you even measure what’s highbrow or lowbrow? It dovetails interestingly with this trend toward explanatory journalism, because it’s the difference between content and tone. We’re used to tone reflecting content — The New York Times uses these fancy titles for people, because it’s the Times and it’s good journalism. When we mess with that, what are we saying to our readers?

Gentzkow: There are ways to measure highbrow versus lowbrow. You can measure the length of the words that you’re using, you can measure how dense the text is, you can look at what kinds of words tend to be used by outlets with highly educated readership versus less educated readership. I think the challenge is getting some measure like that that you’d be willing to attach normative meaning to and say “higher is better for society.” I think that is very difficult. It’s not clear that the world would be better if all media was designed to appeal to people with PhDs. I think in that world probably nobody would look at media — nobody would learn anything.

O’Donovan: I know you have to run, so just one more question. I really liked the language Austan Goolsbee was using to talk about you to other reporters — he called the data sets that you were using “unfathomable” and said these were “unprecedentedly grand ambitions.”

For you, is there a dataset out there — maybe it exists, maybe it doesn’t, maybe you know where it is, maybe you don’t — but is there something, if it was quantifiable, that you’d want as your next dataset?

Gentzkow: It’s a good question. Really being able to see all the media content being produced across all the countries in the world is one.

I think things at the individual level give you more insight into how people are reacting. Ideally, the hypothetical dataset is to look inside everybody’s head and see their beliefs and how they’re thinking about things. So maybe we can put electrodes in peoples’ brains and come up with a way to measure that directly.

Another thing that’s out there is all this geocoded data coming from the fact that everybody’s cellphone now tells you where everybody is every minute of the day. I don’t know what I’m going to do with it, but that’s going to be a huge area of research going forward.

  1. The GDELT Project — “a recent example of people aggregating text from around the world; it illustrates the potential” ↩︎
POSTED     April 29, 2014, 12:19 p.m.
Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
From shrimp Jesus to fake self-portraits, AI-generated images have become the latest form of social media spam
Within days of visiting the pages — and without commenting on, liking, or following any of the material — Facebook’s algorithm recommended reams of other AI-generated content.
What journalists and independent creators can learn from each other
“The question is not about the topics but how you approach the topics.”
Deepfake detection improves when using algorithms that are more aware of demographic diversity
“Our research addresses deepfake detection algorithms’ fairness, rather than just attempting to balance the data. It offers a new approach to algorithm design that considers demographic fairness as a core aspect.”