Nieman Foundation at Harvard
HOME
          
LATEST STORY
This is how Report for America ended up funding a community Wikipedia editor (!) at a library (!!)
ABOUT                    SUBSCRIBE
Aug. 22, 2016, 11:01 a.m.
Reporting & Production
LINK: docs.google.com  ➚   |   Posted by: Shan Wang   |   August 22, 2016

When your news organization publishes data stories, does it always publish a “nerd box” alongside it, explaining the methodology behind the analysis and detailing decisions made along the way? Does it publish the complete raw data set, in its naked glory? Or does it publish a cleaned-up version of the data? Or nothing at all?

Christine Zhang, a 2016 Knight-Mozilla OpenNews Fellow based at the Los Angeles Times’ data desk, and Dan Nguyen, who teaches computational journalism at Stanford, want to hear from people working in newsrooms directly about the decisions behind making data public (or not). The (qualitative) survey is here, and tries to get at how data and methodology are shared (GitHub? Jupyter? Google Drive? Dropbox?), and why (Increases authoritativeness? Improves internal workflow? Ensures accuracy of the analysis?).

“Dan and I both have academic and journalism backgrounds. And for us, data journalism seemed to be very much tied to social sciences, and examining data to find stories definitely has parallels with the way that social scientists work with data to write papers and provide conclusions,” Zhang, who was previously a research analyst at Brookings, said. “We started thinking about how in social sciences, peer review is the way people check their work. How do we check our work as data journalists, as people in the newsroom who tell data stories? Our research is about that nerd box, examining the transparency and openness that goes with data stories.” (Zhang recently moderated a SRCCON session with Ariana Giorgi on peer reviewing data stories.)

Part of their research includes a quantitative analysis of GitHub repos from news organization-associated accounts. ProPublica’s Scott Klein created a bot that tweeted every time a news organization posted a GitHub repo, and Zhang and Nguyen pored over the list of organizations and the people affiliated with those organizations, filtering out non-data source repos like web development frameworks that might also be posted to GitHub.

“Our goal is essentially to look at general trends in data being put up on GitHub publicly, looking at which organizations are doing it more consistently and which are not, the types of stories that tend to merit that sort of consideration,” Zhang said. (BuzzFeed News, for example, regularly creates GitHub repos for its investigations and data stories.) “This is why we wanted to launch the qualitative survey as well: to get some commentary in addition to the data that we have. I don’t think this can be representative by any means, but we’d like to collect as many survey responses as we can get, to understand also how newsrooms are sharing their data outside of GitHub.”

Photo by JustGrimes used under a Creative Commons license.

Show tags Show comments / Leave a comment
 
Join the 50,000 who get the freshest future-of-journalism news in our daily email.
This is how Report for America ended up funding a community Wikipedia editor (!) at a library (!!)
“Do something different and do it together.”
What do authority and curiosity sound like on the radio? NPR has been expanding that palette from its founding
From nasal New York accents to vocal fry, NPR’s anchors and reporters have long inflamed debates about whose voices should represent the nation — or just be heard by it.
Small experiments beat big ones, and other takeaways from BizLab’s public radio innovation summit
“Because we’ve been in a relatively protected space, I think one of the challenges is that there’s not always the urgency to change.”