Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations
Ahmed Karim, Fatima Sheaib, Zein Khamis, Maggie Chlon, Jad Awada, and Leon Chlon

TL;DR
This paper investigates procedural hallucinations in large language models, revealing that many errors stem from readout-stage routing failures, and proposes interventions to mitigate these errors.
Contribution
It provides a detailed analysis of procedural hallucinations, decomposes errors into gating and binding failures, and introduces diagnostics and interventions to reduce such errors.
Findings
Most errors are due to Stage 2B binding failures.
Correct values are encoded but often not used in output.
Interventions like oracle checkpointing significantly reduce errors.
Abstract
Large language models can follow complex procedures yet fail at a seemingly trivial final step: reporting a value they themselves computed moments earlier. We study this phenomenon as \emph{procedural hallucination}: failure to execute a verifiable, prompt-grounded specification even when the correct value is present in context. In long-context binding tasks with a known single-token candidate set, we find that many errors are readout-stage routing failures. Specifically, failures decompose into Stage~2A (gating) errors, where the model does not enter answer mode, and Stage~2B (binding) errors, where it enters answer mode but selects the wrong candidate (often due to recency bias). In the hard regime, Stage~2B accounts for most errors across model families in our tasks (Table~1). On Stage~2B error trials, a linear probe on the final-layer residual stream recovers the correct value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Natural Language Processing Techniques
