Simulated Adoption: Decoupling Magnitude and Direction in LLM In-Context Conflict Resolution
Long Zhang, Fangwei Lin

TL;DR
This paper investigates how large language models resolve conflicting information by analyzing their internal geometric representations, revealing a vector rotation mechanism rather than magnitude suppression, which impacts how we detect and understand model compliance.
Contribution
The study provides a layer-wise geometric analysis across multiple LLMs, demonstrating that compliance arises from orthogonal vector interference rather than signal dilution, challenging existing hypotheses.
Findings
Models maintain residual norms despite performance drops.
Compliance involves quasi-orthogonal vector rotations.
Scalar confidence metrics are insufficient for detecting hallucinations.
Abstract
Large Language Models (LLMs) frequently prioritize conflicting in-context information over pre-existing parametric memory, a phenomenon often termed sycophancy or compliance. However, the mechanistic realization of this behavior remains obscure, specifically how the model resolves these knowledge conflicts through compliance, and whether this suppression arises from signal magnitude dilution or directional geometric alteration within the residual stream. To resolve this, we conducted a layer-wise geometric analysis across Qwen-3-4B, Llama-3.1-8B, and GLM-4-9B, decomposing the residual stream updates induced by counter-factual contexts into radial (norm-based) and angular (cosine-based) components. Our empirical results reject the universality of the "Manifold Dilution" hypothesis, as two of the three architectures maintained stable residual norms despite exhibiting significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
