On the Geometric Structure of Layer Updates in Deep Language Models
Jun-Sik Yoo

TL;DR
This paper investigates the geometric structure of layer updates in deep language models, revealing a dominant tokenwise component and a distinct residual that influences model behavior.
Contribution
It introduces a framework to analyze layer update geometry, showing the residual's distinctness and its functional impact across various architectures.
Findings
Layer updates are mostly aligned with a tokenwise component.
The residual component is geometrically distinct and less aligned.
Approximation error correlates strongly with output perturbation.
Abstract
We study the geometric structure of layer updates in deep language models. Rather than analyzing what information is encoded in intermediate representations, we ask how representations change from one layer to the next. We show that layerwise updates admit a decomposition into a dominant tokenwise component and a residual that is not captured by restricted tokenwise function classes. Across multiple architectures, including Transformers and state-space models, we find that the full layer update is almost perfectly aligned with the tokenwise component, while the residual exhibits substantially weaker alignment, larger angular deviation, and significantly lower projection onto the dominant tokenwise subspace. This indicates that the residual is not merely a small correction, but a geometrically distinct component of the transformation. This geometric separation has functional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
