Press freedom means controlling the language of AI

Generative AI systems act like “stochastic parrots,” using statistical models to guess word orders and pixel placements. That’s incompatible with a free press that commands its own words.

By Mike Ananny and Jake Karr @ananny Sept. 27, 2023, 11:59 a.m.

Generative AI poses the biggest threat to press freedom in decades, and journalists should act quickly to organize themselves and radically reshape its power to produce news.

The news industry has long been buffeted by economic and technological challenges beyond its control. From early innovations in moveable type and photo reproduction to more recent concerns about search engine rankings, social media algorithms, audience analytics, pulled tech funding, and the infamous “pivot to video,” news organizations have tried to adapt to rapidly evolving information ecosystems, following the power and money of new technologies while simultaneously trying to make them align with news values and editorial judgments. The press today is dependent on distributed, technological infrastructures owned and operated by a select few powerful corporations.

And now along comes GenAI, threatening to upend industries. Teachers are questioning the value of essay assignments, doctors are using GenAI to communicate with patients, and Hollywood actors and writers are mounting vigorous defenses against studios aiming to use GenAI to create scripts, capture actors’ likenesses, and synthetically generate films.

Journalism, too, is trying to understand and harness GenAI’s power. There are countless experiments to computationally fabricate headlines, stories, images, videos, podcasts, broadcast personalities, and even interviews through easy-to-use off-the-shelf technologies that until recently were the stuff of industry prototypes and computer science labs. Though newsrooms have used some version of AI for years — to craft simple stories, search archives, test headlines, and analyze audience data — and journalists have developed forensic fact-checking routines to guard against fake media, today’s journalists are also rapidly experimenting with synthetic media tools like ChatGPT, Bard, DALLE, Jasper.ai, Google’s Genesis prototype, and countless other competitors.

July 20, 2023

NewsCorp uses GenAI to create about 3,000 local Australian news stories each week. NPR’s Planet Money used GenAI to script an episode with cloned voices. Kuwait News used GenAI to fabricate a television news presenter. CNET experimented with using GenAI to write dozens of stories (though many articles contained errors). Many newsrooms have been busy developing their own rules around the use of GenAI. And earlier this summer Google pitched a GenAI “helpmate” that it said could generate news stories.

GenAI looks likely to become entrenched across the news industry, deepening even further the press’s dependence on tech companies, their data infrastructures, and often inscrutable machine learning models. News organizations may soon outsource to tech companies not only the power and responsibility to disseminate and curate news, but to create it in the first place.

This power goes to the core of journalism’s public service, namely its capacity and obligation to artfully, eloquently, and intentionally use language to create and debate the ground truths that anchor shared social realities. As countless journalism scholars have shown, and expert practitioners know, the words journalism uses matter like no other words because, at their best, journalism’s words emerge out of public service, unimpeachable reporting, self-reflexive news judgment, eloquent storytelling, rigorous editing, and timely publication. News is not “content,” readers are not “users,” stories are not “syntheses.” A truly free press controls its language from start to finish. It knows where its words come from, how to choose and defend them, and the power that comes from using them on the public’s behalf.

[ Click here to see the future of news in your inbox daily ]

But GenAI language has no such commitment to the truth, eloquence, or the public interest. GenAI systems use statistical models to guess which word orders and pixel placements fit patterns that computational models have identified in vast and largely unexamined datasets. They act like “stochastic parrots.” Journalism that uses the large language models and statistical patterns of Big Tech’s GenAI runs the risk of being not just biased or boring. Such journalism is potentially anathema to a free press because it surrenders the autonomy — not to mention aesthetic joy — that comes from knowing why and how to use language.

As scholars of journalism and press freedom, we’ve been tracking these developments closely. Through our work with Knowing Machines — a research project that traces the history, practices, and politics of machine learning systems like GenAI — we’ve been analyzing hundreds of news stories, policies, guidelines, commentaries, and think pieces on GenAI in journalism. And we are concerned that these implications for press freedom — beyond reporting workflows, newsroom policies, and labor pressures — have not been at the center of the public conversation that journalism needs to have about GenAI.

Up until now, this conversation has reflected a widespread sense of inevitability about the technology and its ability to replace, improve, or swallow the news. Hoping to protect an already economically and politically battered industry, journalists have settled into largely defensive, reactive postures, focused on two interrelated concerns: protecting newsroom jobs and business models, and ensuring that GenAI — with its hallucinations of misinformation and high-profile errors — passes journalistic tests of truthfulness.

These focuses on labor and truth are important and understandable, but they miss what’s happening below the surface. The view that GenAI is just the latest technological tool for journalists and news organizations to harness responsibly ignores how it dramatically increases the industry’s dependence on tech companies in unsettling and often unknowable ways, including by relying on GenAI to classify, compute, and create the language that journalists use to tell stories — the stories that we use to know and govern ourselves.

There are two ways that journalists could use this GenAI moment to defend and even strengthen press freedom.

First, taking a page from the Writers and Screen Actors Guilds, and aligning with some newsroom unions, journalists could find their collective voice on GenAI. Indeed, we’ve started to see some halting but hopeful efforts at collective action. Some publishers are attempting to form a coalition to demand fair compensation from GenAI companies that use news copy to train their models — though key outlets like The New York Times and AP appear intent on going it alone. And newsroom unions are pushing for greater worker protection, though those efforts are so far focused on mitigating the effects of GenAI.

Journalists could look beyond concerns about copyright and automated labor to ask whether GenAI’s synthetic, statistical, and proprietary nature — its language comes from systems controlled by a few powerful people — is even compatible with a free press that commands its own words. Journalists could ask how their obligations as public servants with constitutionally protected rights align with their willingness to use ideas, headlines, ledes, phrasings, edits, images, and more made by unaccountable, privately controlled, and often opaque computational systems using datasets that are known to be flawed. If journalists could speak to technology companies with one voice, they could assert their power and radically reshape GenAI, demanding that tech companies help support the press that people need.

Second, for journalists to remake GenAI in the public interest, they need to use their collective voice to change GenAI infrastructure. This means more than learning how to ask ChatGPT questions, reacting to the answers, labeling GenAI news, or verifying facts in a GenAI story. It means critiquing, remaking, and rejecting GenAI systems when journalists judge them to be inadequate for news work. Hidden beneath GenAI outputs is a vast ocean of data with histories and politics that make GenAI anything but neutral or objective. Who categorized and labeled the dataset, what is overrepresented in or absent from a dataset, how often does the system fail and who suffers most from those mistakes? Who within newsrooms has the power to license GenAI systems, what do they ask before rolling out a new tool, and how much power do journalists have to refuse some or all of a GenAI infrastructure? Are journalism students being trained not just to use data within stories, but to interrogate the politics of GenAI datasets and models — including the ones in their own newsrooms?

Journalists need to get good, fast, at looking under the hood of GenAI systems and developing the power to shape them. This means examining datasets, categories, assumptions, failure rates, engineering cultures, and economic imperatives driving GenAI systems. Reporters need to ask hard questions about how GenAI might reveal confidential data, put sources at risk, and damagingly confuse carefully crafted news with other types of “data” or “content.” Although journalists may become seduced by GenAI’s promises of statistical language without politics, they should see how such synthetic media is often incompatible with the more subtle and precise language that experienced journalists take years to learn how to use.

Want more? Subscribe to our newsletter here and have Nieman Lab’s daily look at the changing world of digital journalism sent straight to your inbox.

A free and responsible press is an eloquent press that knows its words. It knows the politics of its words, it defends its language with courage, and it changes its words when it knows that it should. As an institution — not simply individual journalists or organizations making their own decisions — the press needs to know how GenAI either fulfills or harms its public mission. It needs to speak with one voice to technology companies peddling GenAI tools, using its unique power as a vital and constitutionally protected public institution to reject technological takeovers of the language it uses on the public’s behalf. If GenAI stays a technical curiosity, a fetishized fad, an unknowable mystery, or a seemingly neutral tool, journalists run the risk of gaslighting themselves out of a mission, gradually accepting the topics, facts, stories, words, and faux eloquence of GenAI as “good enough.” Such a press would be anything but free.

Mike Ananny is an associate professor of communication and journalism at the University of Southern California Annenberg School. Jake Karr is the deputy director of New York University’s Technology Law and Policy Clinic. Both are members of Knowing Machines, a research project tracing the histories, practices, and politics of how machine learning systems are trained to interpret the world.”

Prompt: “A pet shop that only sells parrots. A wall of parrot cages. Someone has opened the doors to the parrot cages and the parrots are escaping and flying in a panic. There are newspapers everywhere.”

POSTED Sept. 27, 2023, 11:59 a.m.

SEE MORE ON Business Models

Show tags

TWITTER FACEBOOK EMAIL