Nieman Foundation at Harvard
HOME
          
LATEST STORY
With Hurricane Milton looming, NPR stations got a lower-bandwidth way to reach residents
ABOUT                    SUBSCRIBE
Feb. 22, 2024, 12:27 p.m.

With The New York Times suing Microsoft and OpenAI for copyright infringement (a case the Times might well win, AI writer and researcher Timothy B. Lee and Cornell professor James Grimmelman argued this week), it’s a good time to take a look at how news sites in general are responding to tech companies’ use of their content. A report out Thursday from the Reuters Institute for the Study of Journalism finds that nearly half (48%) of the top news publishers across 10 countries were blocking OpenAI from crawling their sites as of the end of 2023.

The websites of legacy print publications (like The New York Times and Der Spiegel) were more likely to block AI crawlers than TV and radio broadcasters or digital-born news sites — 57% of them were doing so, according to Richard Fletcher’s research.

News websites were less likely to block Google’s AI crawler than OpenAI’s, with a little less than a quarter doing so, but “almost every website (97%) that decided to block Google’s AI crawler was also blocking OpenAI’s crawlers.” From the report:

The proportion of top online news websites blocking OpenAI ranged from 79% in the US, to just 20% in Mexico and Poland. For Google, the proportion blocking their AI crawler ranged from 60% in Germany to 7% in Poland and Spain. In general, outlets in the Global North were more likely to be blocking than those in the Global South. (Interestingly, the figures are aligned with attempts to index countries in terms of AI capabilities and preparedness, such as those published by Tortoise and Oxford Insights, both of which rank the US first.)

In every country apart from Germany, where the figure was 60% for both, more top news websites blocked OpenAI’s crawlers than Google’s. Moreover, almost every website that blocked Google AI also blocked OpenAI (97%). This could be because ChatGPT is more prominent and widely used than Bard/Gemini, or it could be because the OpenAI crawler was released first. But it is also possible that publishers are more cautious about blocking Google in case it affects their prominence in search results — even though there are separate crawlers for search and AI.

You can read the research here.

Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
With Hurricane Milton looming, NPR stations got a lower-bandwidth way to reach residents
In normal times, text-only websites are a niche interest. But a natural disaster is not normal times.
How a 19th-century news revolution sparked activists, influencers, disinformation, and the Civil War
Long before anyone was accused of being “woke,” the Wide Awakes used new news technology to rapidly construct a national movement.
How The New York Times incorporates editorial judgment in algorithms to curate its home page
The Times’ algorithmic recommendations team on responding to reader feedback, newsroom concerns, and technical hurdles.