First Hallucination Tokens Are Different from Conditional Ones
Jakob Snel, Seong Joon Oh

TL;DR
This paper reveals that the first hallucinated token in large language models is significantly easier to detect than subsequent ones, highlighting a structural property useful for improving hallucination detection methods.
Contribution
It introduces the finding that the first hallucination token is more detectable, based on token-level annotations, and emphasizes its importance for fine-grained hallucination detection.
Findings
First hallucination tokens are more detectable than later ones.
The structural property holds across different models.
Token-level detection can be improved by focusing on the first hallucination token.
Abstract
Large Language Models (LLMs) hallucinate, and detecting these cases is key to ensuring trust. While many approaches address hallucination detection at the response or span level, recent work explores token-level detection, enabling more fine-grained intervention. However, the distribution of hallucination signal across sequences of hallucinated tokens remains unexplored. We leverage token-level annotations from the RAGTruth corpus and find that the first hallucinated token is far more detectable than later ones. This structural property holds across models, suggesting that first hallucination tokens play a key role in token-level hallucination detection. Our code is available at https://github.com/jakobsnl/RAGTruth_Xtended.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychedelics and Drug Studies · Hallucinations in medical conditions · Biofield Effects and Biophysics
