Nieman Foundation at Harvard
A Swiss publisher is trying to attract a paying audience with an app sampling stories across publications
ABOUT                    SUBSCRIBE
March 25, 2016, 10:20 a.m.
Reporting & Production

Think you’re bad at math? A new Tow Center report explores the principles behind data journalism

“The future is very hard to know, but standards of journalistic accuracy apply to descriptions of the future at least as much as they apply to descriptions of the present, if not more so.”

Are you the sort of journalist who believes you’re doomed to be “bad at math”? Journalist and data scientist Jonathan Stray’s new Tow Center report, The Curious Journalist’s Guide to Data, is not exactly a how-to guide on data journalism and statistics, but more of a book that gets down to the fundamentals of how journalists use, and can use, data, with real-life examples. (There’s still some math, of course, and some discussion of statistical principles.) It’s readable and clear, and its dedication is fitting: “For every journalist who has ever thought they’re bad at math. What if you’re wrong?”

The report divides discussion of data into three sections, starting at the very core of what data is, with questions around the idea of “quantification.” For instance, here’s Stray on measurement error:

In practice, nothing can be measured perfectly.

A random sample has a margin of error due to sampling, but every quantification has error for one reason or another. The length of a table cannot be measured much finer than the tick marks on whatever ruler you use, and the ruler itself was created with finite precision. Every physical sensor has noise, limited resolution, calibration problems, and other unaccounted variations. Humans are never completely consistent in their categorizations, and the world is filled with special cases. And I’ve never seen a database that didn’t have a certain fraction of corrupted or missing or simply nonsensical entries, the result of glitches in increasingly complex data-generation workflows.

Error creeps in, and the data never quite matches the description on the box. Anyone who works with data has had this beaten into them by experience.

Even simple counts break down when you have to count a lot of things. We’ve all sensed that large population figures are somewhat fictitious. Are there really 536,348 people in your hometown, as the number on the “Welcome To …” sign suggests?

Data doesn’t speak for itself, and the second section of Stray’s book, centered around data analysis or interpretation, runs through real-world policy examples, such as whether imposing earlier closing time for bars in New South Wales actually reduced drunken nighttime assaults. Stray warns of reading significance in data when it’s just pure coincidence, or not properly eliminating other possible explanations for why some data looks the way it does.

The method of competing hypotheses need not involve data at all. You can apply the idea of ruling out hypotheses to any type of reporting work, using any combination of data and non-data sources. The concept of triangulation in the social sciences captures the idea that a true hypothesis should be supported by many different kinds of evidence, including qualitative evidence and theoretical arguments. That too is a classic idea…

What you see in the data cannot contradict what you see in the street, so you always need to look in the street. The conclusions from your data work should be supported by non-data work, just as you would not want to rely on a single source in any journalism work.

The story you run is the story that survives your best attempts to discredit it.

In a third section, Stray explores data visualization, from how humans perceive and make sense of data that’s presented to them (“We can’t possibly study the communication of data without studying the human perception of quantities”). Communicating uncertainty around data is important, and so is making rigorously supported predictions:

Yet most journalists think little about accountability for their predictions, or the predictions they repeat. How many pundits throw out statements about what Congress will or won’t do? How many financial reporters repeat analysts’ guesses without ever checking which analysts are most often right? The future is very hard to know, but standards of journalistic accuracy apply to descriptions of the future at least as much as they apply to descriptions of the present, if not more so. In the case of predictions it’s especially important to be clear about uncertainty, about the limitations of what can be known.

The book was released at a launch event Thursday night featuring panelists Meredith Broussard of New York University, Mark Hansen of the David and Helen Gurley Brown Institute for Media Innovation at Columbia University, and Scott Klein of ProPublica. You can The Curious Journalist’s Guide to Data in its entirety here.

Photo by Peter Renshaw used under a Creative Commons license.

POSTED     March 25, 2016, 10:20 a.m.
SEE MORE ON Reporting & Production
Show comments  
Show tags
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
A Swiss publisher is trying to attract a paying audience with an app sampling stories across publications
Tamedia’s 12-App collects the 12 best stories each day from the company’s 20-plus publications.
What does it take to be a “full-service” digital journalism organization? Ask Discourse Media
“We’ve gone down lots of experimental rabbit holes.”
Spain’s has 18,000 paying members, and its eye on the next several million
“We have a potential of six million readers. You may not convince all six million people to be your socios, but if you learn more about their interests, you can get closer.”
What to read next
Newsonomics: In the platform wars, how well are you armed?
“Think about platforms as fishing places where you can find large, engaged audiences and build a relationship with them by providing content. Then offer these users some other services off-platform.”
0BuzzFeed is building a New York-based team to experiment with news video
It is the “center of a Venn diagram” between BuzzFeed Motion Pictures and BuzzFeed News.
0Newsonomics: Can a Bezos buddy act help fend off Gannett’s bid for Tribune?
Tribune Publishing’s Michael Ferro says he wants to bring The Washington Post’s Arc CMS to its newspapers. Is that a grasp at credibility or a model for other news companies to outsource their tech stacks?
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Fuego is our heat-seeking Twitter bot, tracking the links the future-of-journalism crowd is talking about most on Twitter.
Here are a few of the top links Fuego’s currently watching.   Get the full Fuego ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Chi-Town Daily News
The Huffington Post
PBS NewsHour
MediaNews Group
Alaska Dispatch