Nieman Foundation at Harvard
HOME
          
LATEST STORY
What journalists and independent creators can learn from each other
ABOUT                    SUBSCRIBE
Feb. 22, 2024, 12:27 p.m.

With The New York Times suing Microsoft and OpenAI for copyright infringement (a case the Times might well win, AI writer and researcher Timothy B. Lee and Cornell professor James Grimmelman argued this week), it’s a good time to take a look at how news sites in general are responding to tech companies’ use of their content. A report out Thursday from the Reuters Institute for the Study of Journalism finds that nearly half (48%) of the top news publishers across 10 countries were blocking OpenAI from crawling their sites as of the end of 2023.

The websites of legacy print publications (like The New York Times and Der Spiegel) were more likely to block AI crawlers than TV and radio broadcasters or digital-born news sites — 57% of them were doing so, according to Richard Fletcher’s research.

News websites were less likely to block Google’s AI crawler than OpenAI’s, with a little less than a quarter doing so, but “almost every website (97%) that decided to block Google’s AI crawler was also blocking OpenAI’s crawlers.” From the report:

The proportion of top online news websites blocking OpenAI ranged from 79% in the US, to just 20% in Mexico and Poland. For Google, the proportion blocking their AI crawler ranged from 60% in Germany to 7% in Poland and Spain. In general, outlets in the Global North were more likely to be blocking than those in the Global South. (Interestingly, the figures are aligned with attempts to index countries in terms of AI capabilities and preparedness, such as those published by Tortoise and Oxford Insights, both of which rank the US first.)

In every country apart from Germany, where the figure was 60% for both, more top news websites blocked OpenAI’s crawlers than Google’s. Moreover, almost every website that blocked Google AI also blocked OpenAI (97%). This could be because ChatGPT is more prominent and widely used than Bard/Gemini, or it could be because the OpenAI crawler was released first. But it is also possible that publishers are more cautious about blocking Google in case it affects their prominence in search results — even though there are separate crawlers for search and AI.

You can read the research here.

Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
What journalists and independent creators can learn from each other
“The question is not about the topics but how you approach the topics.”
Deepfake detection improves when using algorithms that are more aware of demographic diversity
“Our research addresses deepfake detection algorithms’ fairness, rather than just attempting to balance the data. It offers a new approach to algorithm design that considers demographic fairness as a core aspect.”
What it takes to run a metro newspaper in the digital era, according to four top editors
“People will pay you to make their lives easier, even when it comes to telling them which burrito to eat.”