Nieman Foundation at Harvard
HOME
          
LATEST STORY
PressPad, an attempt to bring some class diversity to posh British journalism, is shutting down
ABOUT                    SUBSCRIBE
Feb. 2, 2022, 9:16 a.m.
Aggregation & Discovery
Reporting & Production

How UC Berkeley computer science students helped build a database of police misconduct in California

When newsrooms, especially local ones, are strapped for engineering resources, the Berkeley students fill in a gap to help journalists complete more ambitious data projects.

In 2018, California passed the “Right to Know Act,” unsealing three types of internal law enforcement documents: use of force records, sexual assault records, and official dishonesty records.

Before the passage of SB1421, California had some of the strictest laws in the United States to shield police officers’ privacy, according to Capital Public Radio, and police misconduct records were deemed “off-limits”.

Six news outlets — Bay Area News Group, Capital Public Radio, the Investigative Reporting Program at the University of California, Berkeley, KPCC/LAist, KQED, and the Los Angeles Times — got together to request those documents, forming the California Reporting Project. Now, 40 news outlets are part of the initiative.

They sent public records requests to more than 700 agencies across the state, from police departments and sheriffs’ offices to prisons, schools, and welfare agencies that have police presence on site. if you’ve ever submitted a records request to a government agency, you know it’s not easy or straightforward to extract information from documents, if you can even get them at all.

But to sort through the more than 100,000 records they’ve gotten back since 2018, Lisa Pickoff-White, KQED’s only data reporter and the data lead on the California Reporting Project, enlisted the help of data science students from UC Berkeley to help organize the data.

The Data Science Discovery Program was founded in 2015 and is part of Berkeley’s Division of Computing, Data Science, and Society. Every semester, the program pairs around 200 students with companies and organizations that have data science–related projects they need help completing. Students spend six to 12 hours a week working on their assignments, for which they receive course credit.

The students have worked with media companies on editorial and operational projects, including the San Francisco Chronicle’s air quality map and the Wall Street Journal’s effort to analyze its source and topic diversity using natural processing language. When newsrooms, especially local ones, are strapped for engineering resources, the Berkeley students fill a gap to help journalists complete more ambitious projects.

“It’s a really natural fit. [We want] students to get a deep understanding of the context of the data analysis that they’re doing, and to consider human context and the implications of the insights and conclusions they’re making,” Data Science Discovery program manager Arlo Malmberg said. “All the things we emphasize in the data science program are at the core of what journalists do as well, in bringing forward the context of a problem in a story for readers, and in providing analysis of the causes of those issues.”

Pickoff-White co-selected four students to work with the California Reporting Project to build a police misconduct database from the records received. They all had particular interests in policing because of various connections in their personal lives. Usually in their data science courses, she said, they work individually on assignments and applications, but they were excited to work as a team on something tangible.

“The purpose of the project really resonated with me,” Pruthvi Innamuri, a sophomore computer science major who worked on the project, said. “During 2020, with a lot of police misconduct happening, I noticed a lot of communities feeling severely hurt and oppressed. I wanted to be able to use my computer science background to work on a project that’s able to better inform people in some way regarding this issue.”

Innamuri and his classmates built programs to recognize basic information from the police records, like names, locations, and case numbers. That made it easier to group files together and organize data for the journalists to analyze.

Some of the stories that have come out of the data from the records include a Mercury News story about how Richmond has more police dog bites than other cities and how Bakersfield police officers broke 45 bones in 31 people in the span of four years. The database isn’t complete yet and the students’ work helps make future data collection easier.

“I don’t know if we’d be able to do this without them,” Pickoff-White said. “None of these newsrooms would be able to automate this work on their own.”

Photo by Lagos Techie.

Hanaa' Tameez is a staff writer at Nieman Lab. You can reach her via email (hanaa@niemanlab.org) or Twitter DM (@HanaaTameez).
POSTED     Feb. 2, 2022, 9:16 a.m.
SEE MORE ON Aggregation & Discovery
Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
PressPad, an attempt to bring some class diversity to posh British journalism, is shutting down
“While there is even more need for this intervention than when we began the project, the initiative needs more resources than the current team can provide.”
Is the Texas Tribune an example or an exception? A conversation with Evan Smith about earned income
“I think risk aversion is the thing that’s killing our business right now.”
The California Journalism Preservation Act would do more harm than good. Here’s how the state might better help news
“If there are resources to be put to work, we must ask where those resources should come from, who should receive them, and on what basis they should be distributed.”