ChatGPT

Study Finds ChatGPT Misattributes and Misrepresents Publisher Content

November 30, 2024 at 4:49:51 AM

TL;DR A study by the Tow Center for Digital Journalism highlights concerns about ChatGPT's accuracy in citing sources. Despite licensing deals, ChatGPT often misrepresents or invents information, risking publishers' trustworthiness and brand safety. The research found ChatGPT frequently provided incorrect citations, even for those allowing content crawling. This inconsistency and lack of transparency could harm publishers' reputations.

Study Finds ChatGPT Misattributes and Misrepresents Publisher Content

The Tow Center for Digital Journalism released a study examining how ChatGPT produces citations for publishers' content, revealing concerns about the accuracy and reliability of these citations. The findings indicate that publishers are vulnerable to the AI tool's tendency to invent or misrepresent information, regardless of their agreements with OpenAI.

Key Findings

Citation Accuracy: ChatGPT often misattributes or inaccurately cites sources. In a test involving 200 quotes from 20 publishers, ChatGPT provided incorrect or partially incorrect responses 153 times, acknowledging its inability to respond accurately only seven times.
Impact on Publishers: Misattribution can harm publishers' reputations and dilute their brand. For example, ChatGPT incorrectly attributed a quote from the Orlando Sentinel to a Time article.
Blocked Content: Even when publishers block OpenAI's crawlers, ChatGPT finds workarounds, sometimes citing plagiarized content from other websites.
Inconsistent Responses: ChatGPT's responses vary due to the "temperature" setting of its language model, leading to inconsistent and sometimes incorrect answers.

Study Finds ChatGPT Misattributes and Misrepresents Publisher Content

Implications for Publishers

Trust and Recognition: Inaccurate citations undermine trust and fail to give proper recognition to original publishers.
Brand Dilution: Generative search tools like ChatGPT risk distancing audiences from the original publishers and incentivizing plagiarism.
Limited Control: Publishers have little control over how their content is represented, even if they allow or block OpenAI's crawlers.

The study highlights the need for OpenAI to ensure accurate and consistent representation of publisher content in its search product. Despite some progress, such as honoring preferences in robots.txt files and creating citation mechanisms, significant flaws and inconsistencies remain. Publishers currently have limited leverage to ensure their content is accurately presented in ChatGPT's search results.