Prediction
AI gets accurate
Name
Gordon Crovitz
Excerpt
“A key revenue source for the AI models is licensing their tools to companies and governments that insist on these models producing trustworthy information.”
Prediction ID
476f72646f6e-24
 

My prediction is that 2024 will be the year when the generative AI models begin to stop confabulating, hallucinating, or, as journalists put it more bluntly, stop making things up.

As soon as the public got access to AI chatbots, the models became infamous for their unreliability. Lawyers were shocked their AI-generated briefs included made-up legal citations. Even Sam Altman warned not to trust his ChatGPT with factual matters.

Much of the training data for AI models is the internet, which means they were trained on a cesspool of misinformation, from Chinese propaganda to healthcare hoaxes and conspiracy sites serving all prejudices. The AI-enhanced internet enables false claims that are better written, more persuasive, and more scalable and customizable than mere human-generated falsehoods.

My colleagues at NewsGuard “red teamed” ChatGPT and Google’s Bard to see how likely they are to spread falsehoods in the news. Using a sampling from our database of significant falsehoods online, these AI models spread the falsehoods between 80% and 98% of the time.

We have also discovered 566 websites masquerading as trustworthy websites that instead publish false news stories apparently created via AI prompts and responses. For example, a Pakistani website called the Global Village Space on Nov. 6 published a detailed story about the psychiatrist treating Israeli Prime Minister Benjamin Netanyahu committing suicide. The story claimed “Dr. Moshe Yatom, a renowned Israeli psychiatrist celebrated for his work in curing severe mental illnesses, was discovered dead in his Tel Aviv home.” The article also stated that Yatom left behind a “devastating suicide note that implicated” Netanyahu and “painted a grim picture of a man who had tried for nine years to penetrate the enigmatic mind of Netanyahu, only to be defeated by what he called a ‘waterfall of lies.’” The article reported an unfinished manuscript of a book by the doctor “sheds light on the extraordinary challenges Yatom faced in attempting to guide his illustrious patient towards a rational understanding of reality.”

The problem is that this news story — which was quickly picked up and spread by Iranian government broadcasters — is made up from beginning to end: Netanyahu doesn’t have a psychiatrist, there is no psychiatrist with that name, and there was no suicide or suicide note or unfinished manuscript. Global Village Space apparently generated this detailed story by using AI to rewrite a satire article about Netanyahu that appeared as satire in 2010, on a website called Legalienate that discloses it’s a “News, Commentary and Satire” website. (After NewsGuard reported on this falsehood, the website updated its headline to disclose it was based on a satirical article.)

Despite so many malign actors taking advantage of AI models to spread falsehoods, I am optimistic that AI models will make progress in cleaning up their act. They have every incentive to fix the problem. The AI models are not like the social media companies for whom misinformation that maximizes engagement and thus ad revenues is a feature and not a bug. In contrast, a key revenue source for the AI models is licensing their tools to companies and governments that insist on these models producing trustworthy information.

My colleagues and I at NewsGuard have seen how AI models can avoid making things up and instead deliver accurate, nuanced responses on topics in the news. When Microsoft prepared its AI model Copilot (formerly Bing Chat), it added access to trust data to its version of ChatGPT. Microsoft has a license to NewsGuard’s reliability ratings of news sources and our Misinformation Fingerprints catalog of the most significant false claims in the news. As a result, prompts on topics in the news on Microsoft Copilot often cite high-quality news sources, indicating they were rated highly by NewsGuard, and identify claims as false when they’re among the fingerprints the AI model has accessed. Semafor editor Ben Smith reported an example of a Microsoft response to a news prompt debunking a Russian disinformation claim about Ukraine as “a true balance between transparency and authority, a kind of truce between the demand that platforms serve as gatekeepers and block unreliable sources, and that they exercise no judgment at all.”

A proposed bet among several leading AI experts on how long it will take to reduce AI falsehoods also gives me confidence. The discussion, conducted over X in June, began with a post by Mustafa Suleyman, the CEO of Inflection AI and author of the thoughtful book on AI, “The Coming Wave.” He predicted that hallucinations by the large-language models would be “largely eliminated by 2025,” adding “That is a big deal. The implications are far more profound than the threat of the models getting things a bit wrong today.”

This led to a one-word response by Eliezer Yudkowsky, an AI researcher: “Bet?” Gary Marcus, a scientist and AI author, responded by saying “I offered to bet him $25k; no reply thus far. Want to double my action?” Sridhar Ramaswamy, a former senior Google executive and founder of the Neeva search engine, asked Suleyman, “Can you elaborate more and perhaps point to relevant papers?” Ramaswamy made the point that “the output is only as good as the input,” which is the internet’s multitudes of misinformation and false claims.

Gary Marcus then returned to the discussion, posting “Pay attention to the fine print on this. Offered to bet @mustafasuleymn on his claim, and he defined ‘largely eliminated’ as model still goofs 20% of the time(!).” Marcus concluded, “We can’t have providers of news, biography, medical info, etc., make stuff up 20% of the time. And if that’s the best Big Tech can do, we need to hold them legally responsible for their errors. All of them.”

The good news is that the AI models have tools at hand to boost their chances of winning a $25,000 bet by reducing their falsehoods dramatically — and avoid the alternative of regulation or legal liability.

I hope Gary Marcus’s challenge and Mustafa Suleyman’s optimism are merged so that all the AI models will spend 2024 eliminating or at least significantly reducing their propensity to spread false claims. The alternative is AI models continuing to be trained on the detritus of the internet, which will mean more misinformation-in, misinformation-out and further undermine the trust that the AI industry needs to reach its potential.

Gordon Crovitz is co-CEO of NewsGuard and former publisher of The Wall Street Journal.

My prediction is that 2024 will be the year when the generative AI models begin to stop confabulating, hallucinating, or, as journalists put it more bluntly, stop making things up.

As soon as the public got access to AI chatbots, the models became infamous for their unreliability. Lawyers were shocked their AI-generated briefs included made-up legal citations. Even Sam Altman warned not to trust his ChatGPT with factual matters.

Much of the training data for AI models is the internet, which means they were trained on a cesspool of misinformation, from Chinese propaganda to healthcare hoaxes and conspiracy sites serving all prejudices. The AI-enhanced internet enables false claims that are better written, more persuasive, and more scalable and customizable than mere human-generated falsehoods.

My colleagues at NewsGuard “red teamed” ChatGPT and Google’s Bard to see how likely they are to spread falsehoods in the news. Using a sampling from our database of significant falsehoods online, these AI models spread the falsehoods between 80% and 98% of the time.

We have also discovered 566 websites masquerading as trustworthy websites that instead publish false news stories apparently created via AI prompts and responses. For example, a Pakistani website called the Global Village Space on Nov. 6 published a detailed story about the psychiatrist treating Israeli Prime Minister Benjamin Netanyahu committing suicide. The story claimed “Dr. Moshe Yatom, a renowned Israeli psychiatrist celebrated for his work in curing severe mental illnesses, was discovered dead in his Tel Aviv home.” The article also stated that Yatom left behind a “devastating suicide note that implicated” Netanyahu and “painted a grim picture of a man who had tried for nine years to penetrate the enigmatic mind of Netanyahu, only to be defeated by what he called a ‘waterfall of lies.’” The article reported an unfinished manuscript of a book by the doctor “sheds light on the extraordinary challenges Yatom faced in attempting to guide his illustrious patient towards a rational understanding of reality.”

The problem is that this news story — which was quickly picked up and spread by Iranian government broadcasters — is made up from beginning to end: Netanyahu doesn’t have a psychiatrist, there is no psychiatrist with that name, and there was no suicide or suicide note or unfinished manuscript. Global Village Space apparently generated this detailed story by using AI to rewrite a satire article about Netanyahu that appeared as satire in 2010, on a website called Legalienate that discloses it’s a “News, Commentary and Satire” website. (After NewsGuard reported on this falsehood, the website updated its headline to disclose it was based on a satirical article.)

Despite so many malign actors taking advantage of AI models to spread falsehoods, I am optimistic that AI models will make progress in cleaning up their act. They have every incentive to fix the problem. The AI models are not like the social media companies for whom misinformation that maximizes engagement and thus ad revenues is a feature and not a bug. In contrast, a key revenue source for the AI models is licensing their tools to companies and governments that insist on these models producing trustworthy information.

My colleagues and I at NewsGuard have seen how AI models can avoid making things up and instead deliver accurate, nuanced responses on topics in the news. When Microsoft prepared its AI model Copilot (formerly Bing Chat), it added access to trust data to its version of ChatGPT. Microsoft has a license to NewsGuard’s reliability ratings of news sources and our Misinformation Fingerprints catalog of the most significant false claims in the news. As a result, prompts on topics in the news on Microsoft Copilot often cite high-quality news sources, indicating they were rated highly by NewsGuard, and identify claims as false when they’re among the fingerprints the AI model has accessed. Semafor editor Ben Smith reported an example of a Microsoft response to a news prompt debunking a Russian disinformation claim about Ukraine as “a true balance between transparency and authority, a kind of truce between the demand that platforms serve as gatekeepers and block unreliable sources, and that they exercise no judgment at all.”

A proposed bet among several leading AI experts on how long it will take to reduce AI falsehoods also gives me confidence. The discussion, conducted over X in June, began with a post by Mustafa Suleyman, the CEO of Inflection AI and author of the thoughtful book on AI, “The Coming Wave.” He predicted that hallucinations by the large-language models would be “largely eliminated by 2025,” adding “That is a big deal. The implications are far more profound than the threat of the models getting things a bit wrong today.”

This led to a one-word response by Eliezer Yudkowsky, an AI researcher: “Bet?” Gary Marcus, a scientist and AI author, responded by saying “I offered to bet him $25k; no reply thus far. Want to double my action?” Sridhar Ramaswamy, a former senior Google executive and founder of the Neeva search engine, asked Suleyman, “Can you elaborate more and perhaps point to relevant papers?” Ramaswamy made the point that “the output is only as good as the input,” which is the internet’s multitudes of misinformation and false claims.

Gary Marcus then returned to the discussion, posting “Pay attention to the fine print on this. Offered to bet @mustafasuleymn on his claim, and he defined ‘largely eliminated’ as model still goofs 20% of the time(!).” Marcus concluded, “We can’t have providers of news, biography, medical info, etc., make stuff up 20% of the time. And if that’s the best Big Tech can do, we need to hold them legally responsible for their errors. All of them.”

The good news is that the AI models have tools at hand to boost their chances of winning a $25,000 bet by reducing their falsehoods dramatically — and avoid the alternative of regulation or legal liability.

I hope Gary Marcus’s challenge and Mustafa Suleyman’s optimism are merged so that all the AI models will spend 2024 eliminating or at least significantly reducing their propensity to spread false claims. The alternative is AI models continuing to be trained on the detritus of the internet, which will mean more misinformation-in, misinformation-out and further undermine the trust that the AI industry needs to reach its potential.

Gordon Crovitz is co-CEO of NewsGuard and former publisher of The Wall Street Journal.