Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

G. Aytug Akarlar

arXiv:2604.15400·cs.LG·April 20, 2026

Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

G. Aytug Akarlar

PDF

TL;DR

This paper provides causal evidence that hallucinations in autoregressive language models are caused by asymmetric attractor dynamics, with early divergence and stable basin structures influencing output trajectories.

Contribution

It introduces a novel causal framework and experimental methods to analyze the dynamics of hallucination formation in transformer language models.

Findings

01

Hallucinations diverge from factual trajectories at the first generated token.

02

Injecting hallucinated activations causes widespread output corruption.

03

Prompt encoding predicts hallucination likelihood at early steps.

Abstract

We present causal evidence that hallucination in autoregressive language models is an early trajectory commitment governed by asymmetric attractor dynamics. Using same-prompt bifurcation, in which we repeatedly sample identical inputs to observe spontaneous divergence, we isolate trajectory dynamics from prompt-level confounds. On Qwen2.5-1.5B across 61 prompts spanning six categories, 27 prompts (44.3%) bifurcate with factual and hallucinated trajectories diverging at the first generated token (KL = 0 at step 0, KL > 1.0 at step 1). Activation patching across 28 layers reveals a pronounced causal asymmetry: injecting a hallucinated activation into a correct trajectory corrupts output in 87.5% of trials (layer 20), while the reverse recovers only 33.3% (layer 24); both exceed the 10.4% baseline (p = 0.025) and 12.5% random-patch control. Window patching shows correction requires…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.