Nieman Foundation at Harvard
HOME
          
LATEST STORY
Higher ed and public radio are enmeshed. So what happens when the culture wars come?
ABOUT                    SUBSCRIBE
Aug. 22, 2016, 11:01 a.m.
Reporting & Production
LINK: docs.google.com  ➚   |   Posted by: Shan Wang   |   August 22, 2016

When your news organization publishes data stories, does it always publish a “nerd box” alongside it, explaining the methodology behind the analysis and detailing decisions made along the way? Does it publish the complete raw data set, in its naked glory? Or does it publish a cleaned-up version of the data? Or nothing at all?

Christine Zhang, a 2016 Knight-Mozilla OpenNews Fellow based at the Los Angeles Times’ data desk, and Dan Nguyen, who teaches computational journalism at Stanford, want to hear from people working in newsrooms directly about the decisions behind making data public (or not). The (qualitative) survey is here, and tries to get at how data and methodology are shared (GitHub? Jupyter? Google Drive? Dropbox?), and why (Increases authoritativeness? Improves internal workflow? Ensures accuracy of the analysis?).

“Dan and I both have academic and journalism backgrounds. And for us, data journalism seemed to be very much tied to social sciences, and examining data to find stories definitely has parallels with the way that social scientists work with data to write papers and provide conclusions,” Zhang, who was previously a research analyst at Brookings, said. “We started thinking about how in social sciences, peer review is the way people check their work. How do we check our work as data journalists, as people in the newsroom who tell data stories? Our research is about that nerd box, examining the transparency and openness that goes with data stories.” (Zhang recently moderated a SRCCON session with Ariana Giorgi on peer reviewing data stories.)

Part of their research includes a quantitative analysis of GitHub repos from news organization-associated accounts. ProPublica’s Scott Klein created a bot that tweeted every time a news organization posted a GitHub repo, and Zhang and Nguyen pored over the list of organizations and the people affiliated with those organizations, filtering out non-data source repos like web development frameworks that might also be posted to GitHub.

“Our goal is essentially to look at general trends in data being put up on GitHub publicly, looking at which organizations are doing it more consistently and which are not, the types of stories that tend to merit that sort of consideration,” Zhang said. (BuzzFeed News, for example, regularly creates GitHub repos for its investigations and data stories.) “This is why we wanted to launch the qualitative survey as well: to get some commentary in addition to the data that we have. I don’t think this can be representative by any means, but we’d like to collect as many survey responses as we can get, to understand also how newsrooms are sharing their data outside of GitHub.”

Photo by JustGrimes used under a Creative Commons license.

Show tags Show comments / Leave a comment
 
Join the 50,000 who get the freshest future-of-journalism news in our daily email.
Higher ed and public radio are enmeshed. So what happens when the culture wars come?
With higher education at the crossroads of the culture war, public media is vulnerable to growing political interference over its operations.
The view from here: Rethinking what local news can and should be
“Your newsroom should match the community. It’s the easiest thing to say, it’s very difficult to do.”
These competitors joined forces to allow readers to use a single login across their news sites
OneLog brings together some of the largest and most trusted Swiss media companies. Their single sign-on solution will reach 2 million active accounts in 2022 — representing one in four inhabitants in the country.