Do Language Models Know When They're Hallucinating References?

Ayush Agrawal; Mirac Suzgun; Lester Mackey; and Adam Tauman Kalai

arXiv:2305.18248·cs.CL·March 21, 2024·23 cites

Do Language Models Know When They're Hallucinating References?

Ayush Agrawal, Mirac Suzgun, Lester Mackey, and Adam Tauman Kalai

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether language models can identify hallucinated references by testing their consistency in recalling authors and details, revealing that models often recognize their own inaccuracies in references.

Contribution

The study introduces a method to detect hallucinated references in language models through internal consistency checks without external resources.

Findings

01

GPT-4 often inconsistently recalls authors of hallucinated references.

02

Language models can accurately recall authors of real references.

03

Models can recognize when they are hallucinating references.

Abstract

State-of-the-art language models (LMs) are notoriously susceptible to generating hallucinated information. Such inaccurate outputs not only undermine the reliability of these models but also limit their use and raise serious concerns about misinformation and propaganda. In this work, we focus on hallucinated book and article references and present them as the "model organism" of language model hallucination research, due to their frequent and easy-to-discern nature. We posit that if a language model cites a particular reference in its output, then it should ideally possess sufficient information about its authors and content, among other relevant details. Using this basic insight, we illustrate that one can identify hallucinated references without ever consulting any external resources, by asking a set of direct or indirect queries to the language model about the references. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/hallucinated-references
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Misinformation and Its Impacts · Ferroelectric and Negative Capacitance Devices

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Softmax · Dropout · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Position-Wise Feed-Forward Layer