HOME
          
LATEST STORY
How The Forward, 118 years old, is remaking itself as the American Jewish community changes
ABOUT                    SUBSCRIBE

Data, uncertainty, and specialization: What journalism can learn from FiveThirtyEight’s election coverage

Nate Silver’s number-crunching blog is perceived as a threat by some traditional political reporters — but its model has lessons for all journalists.
Oct. 31, 2012, 9:30 a.m.

Nate Silver’s FiveThiryEight blog at The New York Times really only does one thing: It makes election predictions. But it does this differently than pretty much everyone else, because it aggregates all available polls using a statistical model calibrated with past election data. He has his critics among the political class, but to my eye, it makes pretty much all other election “horse race” coverage look primitive and uninformed.

FiveThirtyEight has obvious lessons for journalism about data-related topics such as statistics and uncertainty. But I think I also see wider implications for the evolving role of the political journalist. At heart, these changes are about the response of journalism to a world that is increasingly complex and networked.

Data literacy

Silver’s approach has had remarkable success in past elections, correctly predicting the winner in 49 of 50 states in 2008. That doesn’t necessarily mean his model is going to get 2012 right — as Silver will be first to admit — but there is at least one reason to recommend FiveThirtyEight over other sources: It takes the statistics of polling seriously. Polls are subtle creations, careful extrapolations from a small sample to an entire population. Although the basic theory is centuries old, the details are complex and tricky. See, for example, this lengthy analysis of why Gallup polls consistently favor Romney slightly more than other polls.

Silver understands all of this this, and his model accounts for dozens of factors: “house effects” that make particular firms lean in particular ways, the relationships between state and national polls, the effect of economic indicators on election results, post-convention bounces, and lots of other good stuff. Yes, you can talk about all of these factors — but without quantifying them there is no way to know whether the cumulative effect is up or down.

Uncertainty

Recently CNN aired a chart that showed one candidate ahead 49 percent to 47 percent, and the commentators were discussing this lead. But up in the corner in small print, the margin of error of the poll was given as 5.5 percent. In other words, the size of the “lead” was smaller than the expected error in the poll result, meaning that the difference was probably meaningless.

Expected error — quantified uncertainty — is the price you pay for polling a national sample instead of asking every person in the country how they’re going to vote. It means that small variations in poll numbers are mostly meaningless “noise,” because those last 5.5 percent are effectively down to a coin toss. In other words, you’d expect the very next poll to show the lead reversing much of the time. This 2 percent difference with a 5.5 percent margin of error would never pass standard statistical tests such as the t-test — so you couldn’t publish the result in a scientific paper, a medical board wouldn’t authorize a new treatment based on such weak evidence, and you certainly wouldn’t want to place a bet.

So why do journalists spend so much energy talking about a result like this, as if there’s anything at all to learn from such a roll of the dice? One possibility is a widespread misunderstanding of the limitations of statistical methods and how to interpret measures of uncertainty. But I suspect there’s also a deeper cultural force at play here: Journalists are loathe to admit that the answer cannot be known. “Unexpected Reversal in Polls” is a great headline; “Magic Eight Ball says ‘Sorry, Ask Again Later'” is a story no one wants to write — or read. To his great credit, Silver never shies away from saying that we don’t have yet enough information to know something, as when he cautioned that we had to wait a few more days to see if the Denver debate really had any effect.

Aggregation

The big data craze notwithstanding, more data isn’t always better. However, in the limited field of statistical sampling, more samples are better. That’s why averaging polls works; in a sense, it combines all of the individuals asked by different pollsters into one imaginary super-poll with a smaller margin of error. This is the idea behind Real Clear Politics’ simple poll averages and FiveThirtyEight’s more sophisticated weighted averages.

All well and good, but to average polls together you have to be willing to use other peoples’ polling data. This is where traditional journalism falls down. We have the ABC-WaPo poll, the AP-GfK poll, the CNN/ORC poll, and then Gallup, Rasmussen, and all the others. FiveThirtyEight shamelessly draws on all of these and more — while individual news outlets like to pretend that their one poll is definitive. This is a disservice to the user.

This situation is not unlike the battles over aggregation and linking in the news industry more generally. Aggregation disrupts business models and makes a hash of brands — but in the long run none of that matters if it also delivers a superior product for the user.

Specialization

It’s not just statistics. To report well on complicated things, you need specialized knowledge. As Megan Garber put it so well, “While it may still be true that a good magazine — just as a good newspaper — is a nation talking to itself, the standards for that conversation have risen considerably since people have been able to talk to each other on their own.” The traditional generalist education of the journalist is ill suited to meaty topics such as law, science, finance, technology, and medicine. It’s no longer enough to be able to write a good article; on the web, the best is just a click away, and the best on these sorts of subjects is probably being written by someone with the sort of deep knowledge that comes from specialized training.

Silver is a statistician who got into journalism when he began publishing the results of his (initially sabermetric) models; the reverse, a journalist who becomes a statistician when they start modeling polling data, seems like a much longer road.

Journalism today has an obvious shortage of talent in many specialized fields. I’d like the financial press to be asking serious questions about, say, the systemic risks of high-frequency trading — but instead we get barely factual daily market reports that, like most poll coverage, struggle to say something in the face of uncertainty. But then again, most finance reporters have training in neither quantitative finance nor computer science, which makes them probably unqualified for this topic. I suspect that we will see many more specialists brought into journalism to address this sort of problem.

The role of the political journalist

For the last several decades, both in the United States and internationally, “horse race” or “political strategy” coverage of politics has been something like 60 percent or 70 percent of all political journalism. Certainly, it’s important to keep track of who might win an election — but 60 or 70 percent? There are several different arguments that this is way too much.

First, it’s very insider-y, focusing on how the political game is played rather than what sort of information might help voters choose between candidates. Jay Rosen has called this the cult of the savvy. As one friend put it to me: “I wish the news would stop talking about who won the debate and start asking questions about what they said.”

Second, this quantity of horse race coverage is massively wasteful. Given the tall problems of uncertainty and attributing causation, can you really produce all that many words about the daily state of the race? Can you really say anything different than the thousands of other stories on the topic? (Literally thousands — check Google News.) So why not cover something else instead? I find it noteworthy that it was not journalists who crunched the numbers behind Romney’s centerpiece tax plan. That task, really nothing more than a long night with a spreadsheet, fell to think tanks.

Third and finally, FiveThirtyEight has set a new standard for horse race coverage. We should rejoice that this is a higher standard than we had before, and hopefully represents a narrowing of the data gap between politicians and journalists. It’s also a complicated and presumably expensive process. Because there are many assumptions and judgement calls that go into such a complex statistical model, we really do need more than one. (And indeed, there are other models of this type.) But we don’t need one from every newsroom — and anyway, you need to hire a statistician to produce a statistical model. The politics desk of the future might look a lot different than it does today.

Photo of Nate Silver by J.D. Lasica used under a Creative Commons license.

POSTED     Oct. 31, 2012, 9:30 a.m.
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
How The Forward, 118 years old, is remaking itself as the American Jewish community changes
The newspaper, first published in Yiddish, is facing all the familiar pressures of print, combined with a shifting base of potential readers.
Newsonomics: Are local newspapers the taxi cabs of the Uber age?
Local newspapers still act as if they’re monopolies — despite all the new players eating away at their audiences’ attention. Is there room to adapt?
The Dallas Morning News is building data (and sources) through its new Rolodex tool
The open-source tool lets reporters contribute contacts to a centralized newsroom collection of sources — but it can also be used to build larger reader-facing data products.
What to read next
2401
tweets
The Economist’s Tom Standage on digital strategy and the limits of a model based on advertising
“The Economist has taken the view that advertising is nice, and we’ll certainly take money where we can get it, but we’re pretty much expecting it to go away.”
889A wave of distributed content is coming — will publishers sink or swim?
Instead of just publishing to their own websites, news organizations are being asked to publish directly to platforms they don’t control. Is the hunt for readers enough to justify losing some independence?
448This is my next step: How The Verge wants to grow beyond tech blogging
“We want to use technology as a way to define pop culture, in the way Rolling Stone used music and Wired used the early Internet.”
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Tribune Publishing
DocumentCloud
Craigslist
Gotham Gazette
New Haven Independent
Ushahidi
National Review
The Atlantic
Medium
The Weekly Standard
New England Center for Investigative Reporting
Connecticut Mirror