The Map of Misbelief: Tracing Intrinsic and Extrinsic Hallucinations Through Attention Patterns

Elyes Hajji; Aymen Bouguerra; Fabio Arnez

arXiv:2511.10837·cs.LG·November 17, 2025

The Map of Misbelief: Tracing Intrinsic and Extrinsic Hallucinations Through Attention Patterns

Elyes Hajji, Aymen Bouguerra, Fabio Arnez

PDF

Open Access

TL;DR

This paper introduces a new framework for distinguishing and detecting intrinsic and extrinsic hallucinations in large language models using attention patterns, improving interpretability and detection accuracy.

Contribution

It proposes a novel evaluation framework and attention aggregation methods that enhance hallucination detection and interpretability, especially for intrinsic hallucinations.

Findings

01

Sampling methods detect extrinsic hallucinations effectively.

02

Attention-based aggregation improves intrinsic hallucination detection.

03

Attention signals are valuable for quantifying model uncertainty.

Abstract

Large Language Models (LLMs) are increasingly deployed in safety-critical domains, yet remain susceptible to hallucinations. While prior works have proposed confidence representation methods for hallucination detection, most of these approaches rely on computationally expensive sampling strategies and often disregard the distinction between hallucination types. In this work, we introduce a principled evaluation framework that differentiates between extrinsic and intrinsic hallucination categories and evaluates detection performance across a suite of curated benchmarks. In addition, we leverage a recent attention-based uncertainty quantification algorithm and propose novel attention aggregation strategies that improve both interpretability and hallucination detection performance. Our experimental findings reveal that sampling-based methods like Semantic Entropy are effective for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Mental Health via Writing · Explainable Artificial Intelligence (XAI)