Nieman Foundation at Harvard
HOME
          
LATEST STORY
Newsonomics: It’s looking like Gannett will be acquired by GateHouse — creating a newspaper megachain like the U.S. has never seen
ABOUT                    SUBSCRIBE
Feb. 14, 2019, 11:21 a.m.
Reporting & Production

Acing the algorithmic beat, journalism’s next frontier

In a world where key decisions are increasingly driven by algorithms, journalists need to take a closer look at how they work and how they impact individuals and society. Here’s how The Wall Street Journal is approaching it.

Algorithms shape large parts of everyday life: our interactions with other people, what products we purchase, the information we see (or don’t see), our investment decisions and our career paths. And we trust their judgment: people are more likely to follow advice when they are being told that it came from an algorithm rather than a human, according to a Harvard Business School study.

Machines make mistakes

Despite our growing reliance on algorithms, the Pew Research Center found that Americans are concerned with the fairness and effectiveness of computer programs that make important decisions in their lives. 58 percent feel that algorithms are likely to reflect some level of human bias.

And they’re right. Even though algorithms can seem “objective” and can sometimes even outperform human judgment, they are still fallible. The notion that algorithms are neutral because math is involved is deeply flawed. After all, algorithms are based on data created by humans — and humans make mistakes and have biases. That’s why American mathematician Cathy O’Neil says: “Algorithms are opinions embedded in code.”

Machine bias can have grave consequences. A hiring algorithm at a large tech company might teach itself to prefer male applicants over female applicants. Policing software that conducts risk assessments might be biased against black people. And a content recommendation algorithm might amplify conspiracy theories.

Reporting on black boxes

In a world where key decisions are increasingly driven by algorithms, news organizations are taking a closer look at how these systems work and how they impact individuals and society.

Algorithms can be difficult to explain to readers — they can require technical domain knowledge to understand, they can change rapidly, and private companies often keep the details of their operation under wraps. In some cases, even the companies or government agencies that own the algorithms might not have full visibility into their inner workings because the systems are not developed to explain their decisions. The complexity of algorithmic calculations means it can be very challenging to ascertain exactly how a certain result was reached — meaning those using the system are at some level trusting it blindly. That’s especially worrisome when we think about algorithms being used by governments to make wide-ranging decisions like assessing the safety of airlines or bridges.

Reporters investigating how algorithms work try to look inside them. That’s what “algorithmic transparency reporting” is all about: shining a light into the opaque nature of these black boxes, trying to track the steps from input to output that cannot be seen in full. To do this, journalists are expanding their toolbox and collaborating with data scientists and technologists.

When algorithms fail, they can lead to discrimination, financial losses, privacy breaches, and more. These are all instances worth investigating for journalists. The algorithms beat is relatively young, but it’s likely to become more and more important as organizations and governments adopt algorithmic technologies more widely.

The news value of algorithms

What are the sorts of algorithms that might be of journalistic interest? Think of cases where traditional fields adopt algorithms, or algorithms spur new industries; when algorithms make mistakes or demonstrate biases; when advancements in research unlock new algorithmic possibilities; or when algorithms are being regulated by governments.

New frontiers

Algorithms are newsworthy when they begin to disrupt existing industries or launch new ones. The autonomous car industry, for example, has been driven by advances in algorithms for navigation, object detection, and other tasks. In the future, it will be relevant for journalists to document these developments, assess the scope of their economic and social impact, and evaluate potential risks in their usage of algorithms (more on this later).

The way algorithms function may also sometimes conflict with existing social values or legal norms, as outlined by Nick Diakopoulos, professor at Northwestern University’s School of Communication in his foundational paper “The Algorithms Beat.” Privacy, for example, is a norm that algorithms can easily violate. Vox has reported on how personalization algorithms may make recipients feel that their identity or data has been compromised.

Explicit errors

Algorithms become newsworthy most often when they make mistakes. When an algorithm deployed at scale does something that it’s not supposed to, that failure can have immense consequences — and challenge public perceptions about algorithms’ fallibility. Since algorithms by nature operate with minimal human oversight and are often perceived as objective, reporting on their failures is a necessary journalistic challenge.

Examples of algorithmic failures include when Google Translate makes incorrect translations , as reported by Mental Floss, or when Apple Maps directs drivers to the wrong location, as highlighted by Forbes. Algorithmic errors can become relevant when they systematically disadvantage certain groups, reflecting the bias of data inputs.

  • According to Reuters, Amazon abandoned an AI tool that had learned to make hiring decisions that favored men. The e-commerce giant says that the tool “was never used by Amazon recruiters to evaluate candidates,” but it didn’t dispute that recruiters looked at the recommendations generated by the recruiting engine.
  • Other examples of discriminatory algorithms include Microsoft and IBM’s facial recognition tech, which Wired reported has lower accuracy rates for subjects who were not white men. Microsoft says it took steps to improve its facial recognition algorithm, while IBM said that it planned to deploy a new version of its service which incorporated the findings of the report.
  • Google’s online advertising algorithm showed higher-paying jobs to men with higher frequency, as explained by the Washington Post. Google says the reason for that might be that advertisers specified that their ads should only be shown to certain users.
  • As reported by The Wall Street Journal, Google has also been criticized for its automatically generated “snippets,” single search results that claim to provide the answer to a specific question. These featured answers highlighted, for example, a site stating incorrectly that Barack Obama was a Muslim member of Congress. A spokeswoman for the company said that Google’s goal isn’t to do the thinking for users but “to help you find relevant information quickly and easily.”

Implicit errors

Algorithms can also have implicit negative consequences, even if they function as programmed. These errors may speak to limitations in how companies are evaluating the scope of their algorithmic impact or failures in government regulation.

One example is YouTube’s recommendation algorithm. Its goal is presumably to keep the user on the site and to generate as many views as possible by recommending videos of interest to the user. The Guardian reported that several researchers, among them a former software engineer of the company, have noticed that the platform tends to suggest videos that promote extremist views like conspiracy theories. While this might help YouTube achieve its goal of more clicks, it may also violate common perceptions of a healthy media diet and might even have implications for democracy as a whole. The company said in a blog post end of January that it would take a “closer look” at ways to reduce the spread of content that borders on violating its community guidelines and “content that could misinform users in harmful ways.”

Algorithms that function correctly can also be deployed incorrectly, manipulated or used in unintentional ways by users. For instance, Harvard Business Review reported on the variety of ways hackers game algorithmic security systems with fake data. When algorithms serve as gatekeepers, they can be susceptible to attacks from adversarial sources, including those who attempt ID theft by manipulating images on facial recognition systems. Another investigation by The Wall Street Journal explored Amazon’s efforts to prevent click farms and reviewers-for-hire from outsmarting its product-ranking system.

Research advancements

Journalists can play an important role in informing the public of advancements in algorithmic research that may yield new potential risks or offer solutions to old problems. For example, research developments in adaptive sampling have the potential to exponentially increase how quickly algorithms learn. And researchers have found a new way to use algorithms to help warn of heart attacks. This type of reporting can draw on traditional science and health reporting techniques to explain new methods and their potential implications to the layperson reader.

Public policy

Political responses to algorithmic technologies are increasingly newsworthy, whether it’s GDPR requirements for algorithmic accountability or debates over whether governments should regulate algorithmic biases, as reported by Tech Republic. The role of reporters here would be to contextualize proposed policies by evaluating the efficacy of planned regulations. Algorithms are also increasingly the subject of litigation, and reporting on these stories requires explaining the new implications of algorithms for existing laws, which were most likely written before the popularization of computer programs.

Questioning algorithms

There are many elements of an algorithm that help determine its quality — and its impacts. These are some of the attributes that can help journalists guide their research:

  • Category: What does the algorithm do (e.g., filtering, prediction, ranking, calculation)?
  • Goal: What is the algorithm optimizing for (e.g., maximizing time spent on site)?
  • Data basis: What data is the algorithm based on and is there any bias within it?
  • Transparency: Is it clearly communicated to users how the algorithm makes decisions?
  • Human override: Is there oversight by humans to quickly make decisions and tweak the algorithm?
  • Explainability: Is the output of the algorithm explainable/interpretable?
  • Detected errors: Are there reported instances of mistakes the algorithm made — false positives (an unobjectionable video flagged as harmful) or false negatives (a harmful video that is not flagged)?
  • Fairness: Are certain groups advantaged or disadvantaged by this algorithm?
  • Privacy: Is usage data from service operations retained/stored/kept or shared with other users or third parties?
  • Robustness: Was the service checked for robustness against adversarial attacks?

Researching algorithms

With more advanced computational journalism or investigative sourcing, it’s possible to expose the inner workings of algorithms or uncover algorithmic errors. Processes through which journalists across newsrooms have evaluated algorithms from the outside include:

  • Scraping data: Where lawful computer programs can be used to scrape data from websites — for example, prices or video views — which can then be used to help reverse-engineer elements of algorithms. But be aware that scraping might violate the website owner’s terms of service agreement, and there may be other legal concerns, such as claims that scraping is a form of hacking in violation of the Computer Fraud and Abuse Act.
  • Crowdsourcing data: Journalists can gather data from the public, such as using social media to crowdsource stories of algorithmic errors. But website owners can also restrict this access to their platform: Facebook, for example, is limiting access to its political ads and recently shut down part of a browser extension built by the journalism nonprofit ProPublica called the “Facebook Political Ad Collector”, with which citizens could collect and share data about the ads they are seeing on the social network. The Knight First Amendment Institute at Columbia University sent a letter to Facebook in August 2018 that calls for an amendment to the platform’s terms and services agreement that would enable journalists to automatically collect public information and to create and use temporary research accounts for research projects.
  • Bot programs: Bots can help assess how algorithms behave differently for various usage patterns, such as logging in from different locations to evaluate geo-targeting. But again, as with scraping, there may be legal concerns with using bots, particularly around the use of any misleading or deceptive tactics.

Technology changes, journalism prevails

When weighing whether to publish a story that reveals the inner workings of an algorithm, journalists should consider the consequences it might have for the organizations deploying the algorithm and to the people using or depending on it. Would disclosing how a specific algorithm works allow readers to manipulate or circumvent it in the future? Once people know which inputs and criteria a computer program takes into account, will they might try to game the system for their benefit? It’s good practice to ask questions like: How might a story on a specific algorithm expose it to manipulation? And who would benefit from that manipulation?

As algorithms are used in more areas of society, the need for newsrooms keeping those systems in check will continue to grow. Given the complexity of auditing algorithms, it’s important to consider how promoting media literacy and developing insightful journalism can be leveraged to hold AI systems accountable and citizens aware of its influences.

Francesco Marconi, Till Daldrup, and Rajiv Pant all work at The Wall Street Journal — as R&D chief, research fellow, and chief product and technology officer, respectively.

Image by Dimitris Ladopoulos used under a Creative Commons license.

POSTED     Feb. 14, 2019, 11:21 a.m.
SEE MORE ON Reporting & Production
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 50,000 who get the freshest future-of-journalism news in our daily email.
Newsonomics: It’s looking like Gannett will be acquired by GateHouse — creating a newspaper megachain like the U.S. has never seen
A combined GannHouse (Gatenet?) would own 1 out of every 6 daily newspapers in America. The goal? Buy two or three more years to figure out how to make money in digital.
Local news projects rush to fill The Vindicator’s void, with the McClatchy-Google network putting down roots
“We’re ultimately trying to do this as small and nimble as possible so that we can be seeing what’s working and throw out what’s not — and quickly being able to shift in a way that’s a little bit harder when you’re working with a 150-year-old newspaper.”
Hey comment mods, you doin’ okay? A new study shows moderating uncivil comments reduces the moderator’s trust in news
“The toll of moderating uncivil comments may be much stronger for moderators putting in several hours or a full day.”