Holding algorithms (and the people behind them) accountable is still tricky, but doable

“We were able to demystify this black box, this algorithm that had very scary connotations, and break it down into what ended up being a very simple linear model.”

By Christine Schmidt @newsbyschmidt March 21, 2018, 11:38 a.m.

The black box of algorithms in public and private life can be impenetrable for journalists, constrained by trade secret exemptions and situational awareness — despite the fact that they can have a huge influence on both public and private life, from what you pay for airfare to whether an app declares you likely to commit a crime. But it doesn’t mean journalists stop trying.

Interviewing the algorithm: How reporting and reverse engineering could build a beat to understand the code that influences us

March 5, 2014

We’ve written about algorithmic accountability before, but the importance of parsing it still remains, especially when the blackest of boxes of algorithms is playing a pretty influential role in society. Reporting on algorithms by reverse-engineering them can work — if you can get the inputs and outputs right. But sometimes just reporting the fact that an algorithm exists within a government can be a revelation. So how do you dig in?

Last month, Ali Winston reported for The Verge in partnership with the Investigative Fund at the Nation Institute that the city of New Orleans was partnering with Palantir, the secretive data analysis company cofounded by Peter Thiel. The city had been using a predictive policing algorithm, unbeknownst to many local elected officials or the public, Winston found.

There's been some hue & cry in the local press about Palantir's role in New Orleans being overblown. I'll just leave you with just one reason why you should take these claims w/a grain of salt" when NOLA claims they didn't analyze social media – what does this slide say? pic.twitter.com/dXEsVSTJpJ

— Ali Winston (@awinston) March 2, 2018

Source document – which I obtained from NOPD, for what it’s worth https://t.co/WdLdGfS3cu

— Ali Winston (@awinston) March 2, 2018

The story landed squarely in public debate:

Two weeks ago, The Verge reported the existence of a six-year predictive policing collaboration between the New Orleans Police Department and Palantir Technologies, a data mining giant co-founded by Peter Thiel. The nature of the partnership, which used Palantir’s network-analysis software to identify potential aggressors and victims of violence, was unknown to the public and key members of the city council prior to publication of The Verge’s findings.

Yesterday, outgoing New Orleans Mayor Mitch Landrieu’s press office told the Times-Picayune that his office would not renew its pro bono contract with Palantir, which has been extended three times since 2012. The remarks were the first from Landrieu’s office concerning Palantir’s work with the NOPD. The mayor did not respond to repeated requests for comment from The Verge for the February 28th article, done in partnership with Investigative Fund, or from local media since news of the partnership broke.

There is also potential legal fallout from the revelation of New Orleans’ partnership with Palantir. Several defense attorneys interviewed by The Verge, including lawyers who represented people accused of membership in gangs that, according to documents and interviews, were identified at least in part through the use of Palantir software, said they had never heard of the partnership nor seen any discovery evidence referencing Palantir’s use by the NOPD.

Winston had reported extensively on the existence and implications of the Palantir–New Orleans algorithm, including how Palantir used the pro bono partnership’s efforts in a sales pitch to Chicago’s police department. That sale didn’t end up going through, but Chicago’s own predictive policing algorithm has also been subject to journalistic scrutiny and even reverse-engineering. Rob Arthur used the public records from other news organizations’ FOIA requests to obtain the inputs and outputs of the algorithm, based on additional information from the police department.

“We had 400,000 data points with their arrest information and their scores and what we didn’t know is the middle part, the algorithm that informed this,” Arthur explained at a panel at NICAR (his slides are here). “What we did was very simple: We ran a statistical model using the input predictors that we had — arrest info and so on — and tried to predict their strategic subject list score [the predictive policing results] as a function of those predictors…We knew we had successfully reverse engineered the model because [based] on our sample data, we were able to predict the strategic subject list score extremely accurately.” They had an R-squared value of .98, meaning they “pretty much nailed what their algorithm was with just the information that they gave us,” Arthur said.

The algorithm had apparently been developed at the Illinois Institute of Technology, a private university, so it wasn’t necessarily subject to FOIA requests, Arthur said — but journalists in Chicago still sued to get access.

“It’s very important we actually see the algorithm for itself, but even without getting that request successfully filled, we were able to demystify this black box, this algorithm that had very scary connotations, and break it down into what ended up being a very simple linear model,” Arthur said. In fulfilling other FOIA requests about the inputs and outputs, the city had said that not all the variables of the algorithm were provided, but Arthur believes they did. Still, “we don’t need to perfectly reverse engineer an algorithm to be able to say something interesting about it.”

To be fair, though, journalists should be wary of reverse-engineering an algorithm — and then getting it wrong. At NICAR, Nick Diakopoulos pointed out that having such a high R-squared value was a confidence booster in publishing. (His slides are here.)

Diakopoulos has been tracking algorithm accountability for years, most recently as the director of Northwestern’s Computational Journalism Lab and Tow Center Fellow, and also as an occasional contributor to Nieman Lab. He also helps maintains the site algorithmtips.org as a resource for journalists parsing potentially newsworthy algorithms in the U.S. He advised interested journalists to be aware of missing information when governments are reluctant to share, to have an expectation of what an algorithm is supposed to do, and to know that it’s never one-and-done since algorithms can always be tweaked. And remember, it’s usually humans that are doing the tweaking.

“As reporters, we really need to push to hold people accountable. Don’t let corporations say ‘it was the algorithm,'” Diakopoulos said. “You need to push on that more and find out where the people are in this system.”

Others on the panel pointed out other systems that could be audited and held accountable as well: targeted job listings, Airbnb results, landlord postings, hotel rankings, and more. Robert Brauneis of George Washington University conducted a study with Rutgers’ Ellen Goodman to test the limits of transparency around governmental big data analytics. They filed 42 different open record requests with public agencies in 23 different states about six predictive algorithm programs. They received no response to 6 requests; 7 responded initially and then didn’t follow through; two were caught up in the courts, and three “requested large sums of money we were not able to provide,” Brauneis said. Another 12 said they did not have materials related to algorithms, 5 sent non-disclosure agreements they had with vendors, and 6 provided some materials ranging from training sets for the algorithms to email correspondence about algorithms.

In our previous coverage of NICARian discussions on algorithmic accountability, Diakopoulos offered some advice for journalists on newsworthiness and thinking critically about the machines we rely on four years ago: “Does the algorithm break the expectations we have?” he asked. “Does it break some social norm which makes us uncomfortable?”

Now, the social norm might be becoming uncomfortable.

Visualization of a Toledo 65 algorithm by Juan Manuel de J. used under a Creative Commons license.

POSTED March 21, 2018, 11:38 a.m.

SEE MORE ON Reporting & Production

Show tags

TWITTER FACEBOOK EMAIL