Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models

Jakub Binkowski; Kamil Adamczewski; Tomasz Kajdanowicz

arXiv:2604.10697·cs.CL·April 14, 2026

Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models

Jakub Binkowski, Kamil Adamczewski, Tomasz Kajdanowicz

PDF

TL;DR

This paper introduces SinkProbe, a new hallucination detection method for large language models that leverages attention sinks, achieving state-of-the-art results by analyzing attention map patterns.

Contribution

The work provides a theoretically grounded approach to hallucination detection based on attention sinks, revealing their role and improving detection accuracy.

Findings

01

Sink scores correlate with tokens having large value vector norms.

02

Previous detection methods implicitly rely on attention sinks.

03

SinkProbe achieves state-of-the-art results across datasets and models.

Abstract

Large language models frequently exhibit hallucinations: fluent and confident outputs that are factually incorrect or unsupported by the input context. While recent hallucination detection methods have explored various features derived from attention maps, the underlying mechanisms they exploit remain poorly understood. In this work, we propose SinkProbe, a hallucination detection method grounded in the observation that hallucinations are deeply entangled with attention sinks - tokens that accumulate disproportionate attention mass during generation - indicating a transition from distributed, input-grounded attention to compressed, prior-dominated computation. Importantly, although sink scores are computed solely from attention maps, we find that the classifier preferentially relies on sinks whose associated value vectors have large norms. Moreover, we show that previous methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.