Nieman Foundation at Harvard
HOME
          
LATEST STORY
From shrimp Jesus to fake self-portraits, AI-generated images have become the latest form of social media spam
ABOUT                    SUBSCRIBE
Aug. 14, 2023, 1:20 p.m.

The New York Times wants to go its own way on AI licensing

Expect the biggest media companies to use their market power to cut better deals with OpenAI and its peers.

When it’s time for news publishers and tech companies to negotiate, the rhetorical action is all about the little guy — the local newspapers, the rural publishers, the ones whose business models have taken the biggest beatings from Big Tech and who need the cash. But once rhetoric turns into reality, it’s usually the big dogs who end up getting fed.

In Australia, the government-mandated link tax has been generating a reported $50 million a year in Google/Facebook money for Rupert Murdoch’s News Corp properties — while smaller publishers had to fight for a seat at the table.

In the U.S., Google’s News Showcase — its “shut up, publishers, we’re paying for news now” scheme — gives the vast majority of publishers zilch, but it’s part of the $30 million-plus that Google reportedly gives The New York Times Co. each year.

The next arena for negotiation is over artificial intelligence — specifically, whether or not companies like OpenAI can use the news stories publishers have put online to train their AIs. Past iterations of those chatbots have been trained on roughly the entire Internet, but now publishers (among others) are seeking to be compensated for the contributions to global knowledge.

It’s still early days for this round, but a lot of familiar plays are being called by all sides. (OpenAI, for instance, has discovered newfound interests in journalism education and local news startups, with a checkbook to match.) And two stories from the past few days make me suspect the big dogs will come out on top again.

First, on Thursday, Adweek’s Trishla Ostwal noticed that The New York Times made a small change to its terms of service recently:

The New York Times updated its terms of services Aug. 3 to forbid the scraping of its content to train a machine learning or AI system.

The content includes but is not limited to text, photographs, images, illustrations, designs, audio clips, video clips, “look and feel” and metadata, including the party credited as the provider of such content.

The updated TOS also prohibits website crawlers, which let pages get indexed for search results, from using content to train LLMs or AI systems.

Indeed, the updated TOS forbids using Times content in “training a machine learning or artificial intelligence (AI) system.”

Then, Sunday night, Semafor’s Max Tani reported that the Times would not be teaming up with other media companies to seek redress of their AI grievances:

The New York Times has decided not to join a group of media companies attempting to jointly negotiate with the major tech companies over use of their content to power artificial intelligence. The move is a major blow to efforts to Barry Diller’s efforts to establish an industry united front against Google and Microsoft.

Diller said at a Semafor media event in April that publishers should sue major tech companies that have trained their AI models on data produced by media organizations. As the Wall Street Journal and Semafor reported, his company IAC has been spearheading an effort to form a group of key publishers that would press for legislative and potential legal action to force the tech companies to pay billions of dollars back to those publishers. The presence of the two pillars of American news — the Times to the center-left and Journal to the center-right — would have been a powerful statement for that coalition.

But three sources with knowledge told Semafor that the Times is no longer a part of the effort. One person said that the Times had discussed joining the group, but never committed.

Note that Barry Diller’s group was never intended to represent “digital publishers” in the broad sense — as in, everyone who publishes things online. It was intended to represent “key” major media companies with the clout to demand compensation. But without The New York Times, it’ll be harder to claim to stand for even that. (News Corp has apparently reached the same conclusion as the Times.)

Barring Congressional action, we’re likely headed down a familiar path: Large tech companies will pick and choose what publishers they want to negotiate with and how much they’re willing to hand over. The numbers will be secret, and they’ll be heavily weighted toward the biggest players.

The moral case for the payments will be stronger this time around than it has been for link taxes. Training an AI doesn’t produce the obvious publisher benefit — traffic — that being indexed by Google or shared on Facebook does. And let’s just be honest: The archives of The New York Times (or the Associated Press) are vastly more valuable to a future ChatGPT than any collection of Gannett local papers can be. So the Times, smartly, has figured out that it’ll probably be better off cutting its own deals. But as you hear all that little-guy rhetoric in the coming years, remember that AI licensing isn’t likely to be an equal boon for everybody.

Joshua Benton is the senior writer and former director of Nieman Lab. You can reach him via email (joshua_benton@harvard.edu) or Twitter DM (@jbenton).
POSTED     Aug. 14, 2023, 1:20 p.m.
Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
From shrimp Jesus to fake self-portraits, AI-generated images have become the latest form of social media spam
Within days of visiting the pages — and without commenting on, liking, or following any of the material — Facebook’s algorithm recommended reams of other AI-generated content.
What journalists and independent creators can learn from each other
“The question is not about the topics but how you approach the topics.”
Deepfake detection improves when using algorithms that are more aware of demographic diversity
“Our research addresses deepfake detection algorithms’ fairness, rather than just attempting to balance the data. It offers a new approach to algorithm design that considers demographic fairness as a core aspect.”