When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction
Vardhan Dongre, Joseph Hsieh, Viet Dac Lai, Seunghyun Yoon, Trung Bui, Dilek Hakkani-T\"ur

TL;DR
This paper investigates how large language models lose the thread of instructions over multiple turns, introducing a diagnostic tool and a mechanistic framework to understand and predict these failures.
Contribution
It proposes the Goal Accessibility Ratio (GAR) and a channel-transition account to explain multi-turn interaction failures in LLMs, with insights across architectures and scales.
Findings
Attention to instructions diminishes over turns, affecting goal retention.
Residual goal information persists in representations even when attention closes.
Causal ablation of attention channels drastically reduces goal recall.
Abstract
Large language models can follow complex instructions in a single turn, yet over long multi-turn interactions they often lose the thread of instructions, persona, and rules. This degradation has been measured behaviorally but not mechanistically explained. We propose a channel-transition account: goal-defining tokens become less accessible through attention, while goal-related information may persist in residual representations. We introduce the Goal Accessibility Ratio (GAR), measuring attention from generated tokens to task-defining goal tokens, and combine it with sliding-window ablations and residual-stream probes. When attention to instructions closes, what survives reveals architecture. Across architectures, the transition yields qualitatively distinct failure modes: some models preserve goal-conditioned behavior at vanishing attention, others fail despite decodable residual goal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
