Study of ChatGPT citations makes depressing reading for publishers.

As a growing number of publishers discontinue content licensing agreements with ChatGPT maker OpenAI, a study published this week by the Tow Center for Digital Journalism examines how AI chatbots generate citations (e.g. sources) for publishers’ content. might be interesting. Well, it’s about reading.

In short, the research shows that whether or not publishers allow OpenAI to crawl their content, they are still subject to the propensity of generative AI tools to invent or misrepresent information.

A study conducted by Columbia Journalism School examined quotes generated by ChatGPT after being asked to identify the source of sample quotes from several publishers. Some of them have signed contracts with OpenAI, some have not.

The center took block quotes from 10 articles each produced by 20 randomly selected publishers (a total of 200 different quotes). This includes content from The New York Times (which is currently suing OpenAI in a copyright lawsuit). The Washington Post (not affiliated with the makers of ChatGPT) The Financial Times (under a licensing agreement); And others.

“We selected quotes that, when pasted into Google or Bing, returned the original article among the top three results and assessed whether OpenAI’s new search tool correctly identified the article from which each quote came,” said Tow researcher Klaudia. Jaźwińska wrote. Aisvarya Chandrasekar described the approach and summarized the results in a blog post.

“Our findings do not bode well for news publishers,” they continue. “While OpenAI emphasizes its ability to provide users with ‘timely answers with links to relevant web sources,’ the company makes no explicit promise to ensure the accuracy of these citations.” “This is a notable omission for publishers who expect their content to be faithfully referenced and represented.”

“Our testing shows that no publisher, regardless of their degree of affiliation with OpenAI, is spared inaccurate representations of their content on ChatGPT,” he added.

unreliable sourcing

The researchers say they found “numerous” instances where publishers’ content was inaccurately cited in ChatGPT, and they also discovered what they call a “spectrum of accuracy in responses.” So while we found “some” citations that were completely correct (meaning ChatGPT accurately returned the publisher, date, and URL of the shared block citation), there were “many” citations that were completely incorrect. And “some” somewhere in between.

In short, ChatGPT’s citations seem to be an unreliable mixed bag. The researchers also found few cases where the chatbot did not show full trust in (incorrect) answers.

Some of the quotes are from publishers who have actively blocked OpenAI’s search crawler. In these cases, researchers say they expected there would be problems generating accurate citations. But they discovered that this scenario raises another problem. This is because the bot “rarely” blamed itself for not being able to generate an answer. Instead, we went back to manipulation to create some sourcing (albeit a bad one).

“In total, ChatGPT returned partially or completely incorrect responses 153 times, but only acknowledged seven times that it could not answer a query accurately,” the researchers said. “Only in these seven outputs did the chatbot use qualifying words and phrases such as ‘appears’, ‘possible’, ‘maybe’, or statements such as ‘exact article not found’.”

They compare this unfortunate situation to a standard Internet search, where search engines such as Google or Bing will typically find the exact quote and point the user to the website where they found it, or say they did not find an exact match. .

ChatGPT “can create a lack of credibility in answers, making it difficult for users to evaluate the validity of claims and understand which parts of the answer are trustworthy or unreliable,” they claim.

For publishers, they suggest, not only can inaccurate citations pose reputational risks, but there can also be commercial risks of readers being pointed elsewhere.

Decontextualized data

The study also highlights another problem. This suggests that ChatGPT may essentially reward plagiarism. Researchers describe instances where ChatGPT misquoted a website that plagiarized portions of “heavily reported” New York Times journalism. In other words, I copied and pasted the text as the source for the NYT article without attribution. In this case: , bots may have generated these incorrect responses to fill information gaps caused by their inability to crawl the NYT website.

“This raises serious questions about OpenAI’s ability to filter and verify the quality and reliability of data sources, especially when dealing with unlicensed or plagiarized content,” they suggest.

In a further worrying finding for publishers that contract with OpenAI, the study found that ChatGPT’s citations are not always reliable, even in their cases. So it appears that allowing crawlers also doesn’t guarantee accuracy.

The fundamental problem, the researchers argue, is that OpenAI’s technology is treating journalism as “decontextualized content” with little regard for its original production context.

Another issue pointed out in the study is that ChatGPT’s responses vary. Researchers asked the bot the same query multiple times and found that it “typically returned a different answer each time.” This is a common phenomenon for GenAI tools, but generally in the context of citations, such inconsistencies are clearly suboptimal when accuracy is sought.

Although the Tow study is small (the researchers acknowledge that “more rigorous” testing is needed), it is nonetheless noteworthy given the high level of deals that major publishers are busy closing using OpenAI.

If media companies had hoped that these agreements would lead to special treatment for their content compared to competitors, at least in terms of generating accurate sourcing, this research suggests that OpenAI has not yet provided that consistency.

Although it is a publisher without a licensing agreement, yet I completely blocked OpenAI’s crawler. Perhaps it was in the hope that ChatGPT would collect at least some traffic when it returned content for the story. This study also makes grim reading because the citations may not be accurate either.

This means that even if publishers of the OpenAI search engine allow crawlers, publishers are not guaranteed “visibility”.

Nor does blocking crawlers completely mean that publishers can protect themselves from the risk of reputational damage by avoiding mentions of their stories on ChatGPT. For example, research has shown that despite ongoing litigation, bots still misattribute articles to the New York Times.

‘An agency with little meaning’

The researchers concluded that as things stand, publishers have “little meaningful agency” over what happens to their content when ChatGPT accesses it (directly or indirectly).

The blog post includes OpenAI’s response to the study’s findings. It accuses the researchers of running “unstructured tests on our products.”

OpenAI also added, “We support publishers and creators by helping ChatGPT’s 250 million weekly users find high-quality content through abstracts, quotes, clear links and attribution.” Manage OAI-SearchBot in robots.txt to respect publisher preferences, including enabling how they appear in searches. We will continue to improve our search results.”