How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing
Javier Mar\'in

TL;DR
This paper investigates how transformer-based language models internally differentiate correct from incorrect answers, revealing that they do so through rotational dynamics in their representations, which emerge at a certain model size.
Contribution
Introduces forced-completion probing to analyze internal dynamics, revealing rotational divergence and active suppression of incorrect answers in transformer models.
Findings
Factual correctness is encoded in the direction, not magnitude, of internal representations.
Models actively suppress incorrect answers rather than passively failing.
Factual processing capabilities emerge at around 1.6 billion parameters, indicating a phase transition.
Abstract
When a language model is fed a wrong answer, what happens inside the network? Current understanding treats truthfulness as a static property of individual-layer representations-a direction to be probed, a feature to be extracted. Less is known about the dynamics: how internal representations diverge across the full depth of the network when the model processes correct versus incorrect continuations. We introduce forced-completion probing, a method that presents identical queries with known correct and incorrect single-token continuations and tracks five geometric measurements across every layer of four decoder-only models(1.5B-13B parameters). We report three findings. First, correct and incorrect paths diverge through rotation, not rescaling: displacement vectors maintain near-identical magnitudes while their angular separation increases, meaning factual selection is encoded in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Action Observation and Synchronization · Child and Animal Learning Development
