Nieman Foundation at Harvard
HOME
          
LATEST STORY
PressPad, an attempt to bring some class diversity to posh British journalism, is shutting down
ABOUT                    SUBSCRIBE
Aug. 22, 2016, 11:01 a.m.
Reporting & Production
LINK: docs.google.com  ➚   |   Posted by: Shan Wang   |   August 22, 2016

When your news organization publishes data stories, does it always publish a “nerd box” alongside it, explaining the methodology behind the analysis and detailing decisions made along the way? Does it publish the complete raw data set, in its naked glory? Or does it publish a cleaned-up version of the data? Or nothing at all?

Christine Zhang, a 2016 Knight-Mozilla OpenNews Fellow based at the Los Angeles Times’ data desk, and Dan Nguyen, who teaches computational journalism at Stanford, want to hear from people working in newsrooms directly about the decisions behind making data public (or not). The (qualitative) survey is here, and tries to get at how data and methodology are shared (GitHub? Jupyter? Google Drive? Dropbox?), and why (Increases authoritativeness? Improves internal workflow? Ensures accuracy of the analysis?).

“Dan and I both have academic and journalism backgrounds. And for us, data journalism seemed to be very much tied to social sciences, and examining data to find stories definitely has parallels with the way that social scientists work with data to write papers and provide conclusions,” Zhang, who was previously a research analyst at Brookings, said. “We started thinking about how in social sciences, peer review is the way people check their work. How do we check our work as data journalists, as people in the newsroom who tell data stories? Our research is about that nerd box, examining the transparency and openness that goes with data stories.” (Zhang recently moderated a SRCCON session with Ariana Giorgi on peer reviewing data stories.)

Part of their research includes a quantitative analysis of GitHub repos from news organization-associated accounts. ProPublica’s Scott Klein created a bot that tweeted every time a news organization posted a GitHub repo, and Zhang and Nguyen pored over the list of organizations and the people affiliated with those organizations, filtering out non-data source repos like web development frameworks that might also be posted to GitHub.

“Our goal is essentially to look at general trends in data being put up on GitHub publicly, looking at which organizations are doing it more consistently and which are not, the types of stories that tend to merit that sort of consideration,” Zhang said. (BuzzFeed News, for example, regularly creates GitHub repos for its investigations and data stories.) “This is why we wanted to launch the qualitative survey as well: to get some commentary in addition to the data that we have. I don’t think this can be representative by any means, but we’d like to collect as many survey responses as we can get, to understand also how newsrooms are sharing their data outside of GitHub.”

Photo by JustGrimes used under a Creative Commons license.

Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
PressPad, an attempt to bring some class diversity to posh British journalism, is shutting down
“While there is even more need for this intervention than when we began the project, the initiative needs more resources than the current team can provide.”
Is the Texas Tribune an example or an exception? A conversation with Evan Smith about earned income
“I think risk aversion is the thing that’s killing our business right now.”
The California Journalism Preservation Act would do more harm than good. Here’s how the state might better help news
“If there are resources to be put to work, we must ask where those resources should come from, who should receive them, and on what basis they should be distributed.”