The Phenomenology of Hallucinations
Valeria Ruscio, Keiran Thompson

TL;DR
This paper investigates why language models hallucinate, revealing that they fail to integrate uncertainty into output generation due to weak coupling between internal uncertainty signals and the output layer, leading to confident yet incorrect outputs.
Contribution
It uncovers the internal mechanisms of uncertainty representation in language models and explains how these contribute to hallucinations, proposing causal interventions to mitigate this issue.
Findings
Uncertain inputs occupy high-dimensional regions with 2-3× the intrinsic dimensionality of factual inputs.
Uncertainty signals are weakly coupled to the output layer, causing hallucinations.
Causal interventions can restore refusal behavior by connecting uncertainty directly to logits.
Abstract
We show that language models hallucinate not because they fail to detect uncertainty, but because of a failure to integrate it into output generation. Across architectures, uncertain inputs are reliably identified, occupying high-dimensional regions with 2-3 the intrinsic dimensionality of factual inputs. However, this internal signal is weakly coupled to the output layer: uncertainty migrates into low-sensitivity subspaces, becoming geometrically amplified yet functionally silent. Topological analysis shows that uncertainty representations fragment rather than converging to a unified abstention state, while gradient and Fisher probes reveal collapsing sensitivity along the uncertainty direction. Because cross-entropy training provides no attractor for abstention and uniformly rewards confident prediction, associative mechanisms amplify these fractured activations until residual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Topological and Geometric Data Analysis · Face Recognition and Perception
