Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology
Jesseba Fernando, Grigori Guitchounts

TL;DR
This paper analyzes the spectral geometry of large language models' residual streams, revealing how training shapes their spectral properties and network topology, affecting computation propagation and perturbation dynamics.
Contribution
It provides the first full Jacobian eigendecomposition analysis across large models, linking spectral geometry, network topology, and learned dynamics.
Findings
Training induces a spectral gradient from non-normal to near-symmetric layers.
A low-rank bottleneck funnels perturbations into few effective dimensions.
Topological community structure predicts Jacobian's amplification or suppression effects.
Abstract
Large language models are remarkably capable, yet how computation propagates through their layers remains poorly understood. A growing line of work treats depth as discrete time and the residual stream as a dynamical system, where each layer's nonlinear update has a local linear description. However, previous analyses have relied on scalar summaries or approximate linearizations, leaving the full spectral geometry of trained LLMs unknown. We perform full Jacobian eigendecomposition across three production--scale LLMs and show that training installs a monotonic spectral gradient through depth -- from non-normal, rotation-dominated early layers to near--symmetric late layers -- together with a cumulative low-rank bottleneck that funnels perturbations into a small fraction of the residual stream's effective dimensions. Our experiments reveal that this gradient and the dimensional collapse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
