Core copyright violation claim moves ahead in The Intercept’s lawsuit against OpenAI

introducing chatgpt appears on computer screen

Core copyright violation claim moves ahead in The Intercept’s lawsuit against OpenAI

The ruling comes after a judge dismissed similar claims filed by Raw Story and AlterNet earlier this month.

By Andrew Deck Nov. 27, 2024, 11:19 a.m.

Last week, a New York federal judge ruled a key copyright violation claim by The Intercept against OpenAI would move ahead in court. The ruling is the latest in a series of major legal decisions involving the AI developer this month, after OpenAI sought to dismiss lawsuits from several digital news publishers.

Judge Jed Rakoff said he’d hear the claim that OpenAI removed authorship information when it allegedly fed The Intercept’s articles into the training data sets it used to build ChatGPT. Doing so could be a violation of the Digital Millennium Copyright Act (DMCA), a 1998 law that, among other protections, makes it illegal to remove the author name, usage terms, or title from a digital work.

The Intercept charts a new legal strategy for digital publishers suing OpenAI

March 20, 2024

The judge dismissed The Intercept’s claim that OpenAI had knowingly distributed copies of its articles after removing the DMCA-protected information. The judge also dismissed all The Intercept’s claims against Microsoft, which has a multibillion-dollar investment in OpenAI and was named in the initial filing. An opinion from the judge, laying out his reasoning for the dismissals, will be published in the coming weeks.

“The decision allows for a DMCA claim on behalf of digital publishers who do not have copyright registrations to proceed against OpenAI,” said Matt Topic, a partner at Loevy & Loevy, who is representing The Intercept. “We’re obviously disappointed to lose the claims against Microsoft, but the core claim is the DMCA claim against OpenAI, and we’re very happy to see that that will be going forward.”

“Our models are trained on publicly available data, grounded in fair use and related principles that we view as fair for creators,” OpenAI spokesperson Jason Deutrom said in a statement.

Earlier this year I reported that The Intercept’s case was carving out a new legal strategy for digital news publishers to sue OpenAI.

The New York Times’ lawsuit against OpenAI, and similar suits filed by The New York Daily News and Mother Jones, lead with claims of copyright infringement. Infringement suits require that relevant works were first registered with the U.S. Copyright Office (USCO). But most digital news publishers don’t have their article archives registered. For many, The Intercept included, filing all of their published work on the internet with the USCO is too costly or burdensome.

Until this summer, the government body required each individual website article page be filed and charged separately. In August, though, the USCO added a rule that allows “news websites” to file articles in bulk. Among other reasons, the decision cited concerns about unchecked infringement of online news content and a hope for copyright registrations to stay “adaptive to technological changes.” But for most digital news publishers seeking legal action against OpenAI, particularly for its use of their work to train ChatGPT, the new rule came too late.

For now, The Intercept case is the only litigation by a news publisher, that is not tied to copyright infringement, to move past the motion-to-dismiss stage.

Earlier this month, the DMCA-focused legal strategy took a major hit when another New York federal judge dismissed all DMCA claims against OpenAI filed by Raw Story and AlterNet. The progressive digital news sites are jointly represented by Loevy & Loevy.

“Let us be clear about what is really at stake here. The alleged injury for which Plaintiffs truly seek redress is not the exclusion of [content management information] from Defendants’ training sets, but rather Defendants’ use of Plaintiffs’ articles to develop ChatGPT without compensation,” wrote Judge Colleen MacMahon in that decision.

Despite the setback, the judge said she would consider an amended complaint against OpenAI that took into account her concerns. A proposed amended complaint by Raw Story and AlterNet was filed by Loevy & Loevy last week, just before The Intercept ruling was announced.

“When they populated their training sets with works of journalism, Defendants had a choice: they could train ChatGPT using works of journalism with the copyright management information protected by the DMCA intact, or they could strip it away. Defendants chose the latter,” reads the proposed amended complaint. “In the process, [OpenAI] trained ChatGPT not to acknowledge or respect copyright, not to notify ChatGPT users when the responses they received were protected by journalists’ copyrights, and not to provide attribution when using the works of human journalists.”

Like The Intercept, Raw Story and AlterNet are asking for $2,500 in damages for each instance that OpenAI allegedly removed DMCA-protected information in its training data sets. If damages are calculated based on each individual article allegedly used to train ChatGPT, it could quickly balloon to tens of thousands of violations.

“The proposed amended complaint would match up and probably even go beyond the allegations that survived in The Intercept case,” said Topic. “Different judges could come out differently on the same question, but we feel optimistic that we will have the opportunity to proceed with an amended claim.”

It is unclear if the Intercept ruling will embolden other publications to consider DMCA litigation; few publications have followed in their footsteps so far. As time goes on, there is concern that new suits against OpenAI would be vulnerable to statute of limitations restrictions, particularly if news publishers want to cite the training data sets underlying ChatGPT. But the ruling is one signal that Loevy & Loevy is narrowing in on a specific DMCA claim that can actually stand up in court.

“We do think that the claim that has survived for The Intercept is a claim that most digital publishers would also be able to bring,” said Topic.

Unsplash

Andrew Deck is a generative AI staff writer at Nieman Lab. Have tips about how AI is being used in your newsroom? You can reach Andrew via email (andrew_deck@harvard.edu), Twitter (@decka227), or Signal (+1 203-841-6241).

POSTED Nov. 27, 2024, 11:19 a.m.

SEE MORE ON Business Models

Show tags

TWITTER FACEBOOK EMAIL