ChatGPT Hallucinates when Attributing Answers
Guido Zuccon, Bevan Koopman, Razia Shaik

TL;DR
This paper systematically analyzes ChatGPT's ability to provide accurate evidence supporting its answers, revealing it often invents references that do not exist or do not support its claims, highlighting hallucination issues.
Contribution
It is the first systematic study of ChatGPT's generated references, showing the model's limitations in attributing real evidence to its answers.
Findings
ChatGPT provides correct answers in about 50.6% of cases.
Only 14% of references suggested by ChatGPT actually exist.
Even existing references often do not support the claims made by ChatGPT.
Abstract
Can ChatGPT provide evidence to support its answers? Does the evidence it suggests actually exist and does it really support its answer? We investigate these questions using a collection of domain-specific knowledge-based questions, specifically prompting ChatGPT to provide both an answer and supporting evidence in the form of references to external sources. We also investigate how different prompts impact answers and evidence. We find that ChatGPT provides correct or partially correct answers in about half of the cases (50.6% of the times), but its suggested references only exist 14% of the times. We further provide insights on the generated references that reveal common traits among the references that ChatGPT generates, and show how even if a reference provided by the model does exist, this reference often does not support the claims ChatGPT attributes to it. Our findings are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
