Nieman Foundation at Harvard
HOME
          
LATEST STORY
A year in, The Guardian’s European edition contributes 15% of the publisher’s pageviews
ABOUT                    SUBSCRIBE
March 25, 2016, 10:20 a.m.
Reporting & Production

Think you’re bad at math? A new Tow Center report explores the principles behind data journalism

“The future is very hard to know, but standards of journalistic accuracy apply to descriptions of the future at least as much as they apply to descriptions of the present, if not more so.”

Are you the sort of journalist who believes you’re doomed to be “bad at math”? Journalist and data scientist Jonathan Stray’s new Tow Center report, The Curious Journalist’s Guide to Data, is not exactly a how-to guide on data journalism and statistics, but more of a book that gets down to the fundamentals of how journalists use, and can use, data, with real-life examples. (There’s still some math, of course, and some discussion of statistical principles.) It’s readable and clear, and its dedication is fitting: “For every journalist who has ever thought they’re bad at math. What if you’re wrong?”

The report divides discussion of data into three sections, starting at the very core of what data is, with questions around the idea of “quantification.” For instance, here’s Stray on measurement error:

In practice, nothing can be measured perfectly.

A random sample has a margin of error due to sampling, but every quantification has error for one reason or another. The length of a table cannot be measured much finer than the tick marks on whatever ruler you use, and the ruler itself was created with finite precision. Every physical sensor has noise, limited resolution, calibration problems, and other unaccounted variations. Humans are never completely consistent in their categorizations, and the world is filled with special cases. And I’ve never seen a database that didn’t have a certain fraction of corrupted or missing or simply nonsensical entries, the result of glitches in increasingly complex data-generation workflows.

Error creeps in, and the data never quite matches the description on the box. Anyone who works with data has had this beaten into them by experience.

Even simple counts break down when you have to count a lot of things. We’ve all sensed that large population figures are somewhat fictitious. Are there really 536,348 people in your hometown, as the number on the “Welcome To …” sign suggests?

Data doesn’t speak for itself, and the second section of Stray’s book, centered around data analysis or interpretation, runs through real-world policy examples, such as whether imposing earlier closing time for bars in New South Wales actually reduced drunken nighttime assaults. Stray warns of reading significance in data when it’s just pure coincidence, or not properly eliminating other possible explanations for why some data looks the way it does.

The method of competing hypotheses need not involve data at all. You can apply the idea of ruling out hypotheses to any type of reporting work, using any combination of data and non-data sources. The concept of triangulation in the social sciences captures the idea that a true hypothesis should be supported by many different kinds of evidence, including qualitative evidence and theoretical arguments. That too is a classic idea…

What you see in the data cannot contradict what you see in the street, so you always need to look in the street. The conclusions from your data work should be supported by non-data work, just as you would not want to rely on a single source in any journalism work.

The story you run is the story that survives your best attempts to discredit it.

In a third section, Stray explores data visualization, from how humans perceive and make sense of data that’s presented to them (“We can’t possibly study the communication of data without studying the human perception of quantities”). Communicating uncertainty around data is important, and so is making rigorously supported predictions:

Yet most journalists think little about accountability for their predictions, or the predictions they repeat. How many pundits throw out statements about what Congress will or won’t do? How many financial reporters repeat analysts’ guesses without ever checking which analysts are most often right? The future is very hard to know, but standards of journalistic accuracy apply to descriptions of the future at least as much as they apply to descriptions of the present, if not more so. In the case of predictions it’s especially important to be clear about uncertainty, about the limitations of what can be known.

The book was released at a launch event Thursday night featuring panelists Meredith Broussard of New York University, Mark Hansen of the David and Helen Gurley Brown Institute for Media Innovation at Columbia University, and Scott Klein of ProPublica. You can The Curious Journalist’s Guide to Data in its entirety here.

Photo by Peter Renshaw used under a Creative Commons license.

POSTED     March 25, 2016, 10:20 a.m.
SEE MORE ON Reporting & Production
Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
A year in, The Guardian’s European edition contributes 15% of the publisher’s pageviews
After the launch of Guardian Europe, one-time donations from European readers increased by 45%.
Press Forward awards $20 million to 205 small local newsrooms
In response to the volume and quality of applications, Press Forward doubled the funding and number of grantees for this open call.
Midwestern news nonprofit The Beacon shuts down its Wichita newsroom
“We’ve realized that we can’t do it all, and have made the decision to no longer have a staffed newsroom in Wichita.”