Do Language Models Know When They're Hallucinating References?
Ayush Agrawal, Mirac Suzgun, Lester Mackey, and Adam Tauman Kalai

TL;DR
This paper investigates whether language models can identify hallucinated references by testing their consistency in recalling authors and details, revealing that models often recognize their own inaccuracies in references.
Contribution
The study introduces a method to detect hallucinated references in language models through internal consistency checks without external resources.
Findings
GPT-4 often inconsistently recalls authors of hallucinated references.
Language models can accurately recall authors of real references.
Models can recognize when they are hallucinating references.
Abstract
State-of-the-art language models (LMs) are notoriously susceptible to generating hallucinated information. Such inaccurate outputs not only undermine the reliability of these models but also limit their use and raise serious concerns about misinformation and propaganda. In this work, we focus on hallucinated book and article references and present them as the "model organism" of language model hallucination research, due to their frequent and easy-to-discern nature. We posit that if a language model cites a particular reference in its output, then it should ideally possess sufficient information about its authors and content, among other relevant details. Using this basic insight, we illustrate that one can identify hallucinated references without ever consulting any external resources, by asking a set of direct or indirect queries to the language model about the references. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Misinformation and Its Impacts · Ferroelectric and Negative Capacitance Devices
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Softmax · Dropout · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Position-Wise Feed-Forward Layer
