Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology

Jesseba Fernando; Grigori Guitchounts

arXiv:2605.14258·cs.LG·May 15, 2026

Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology

Jesseba Fernando, Grigori Guitchounts

PDF

TL;DR

This paper analyzes the spectral geometry of large language models' residual streams, revealing how training shapes their spectral properties and network topology, affecting computation propagation and perturbation dynamics.

Contribution

It provides the first full Jacobian eigendecomposition analysis across large models, linking spectral geometry, network topology, and learned dynamics.

Findings

01

Training induces a spectral gradient from non-normal to near-symmetric layers.

02

A low-rank bottleneck funnels perturbations into few effective dimensions.

03

Topological community structure predicts Jacobian's amplification or suppression effects.

Abstract

Large language models are remarkably capable, yet how computation propagates through their layers remains poorly understood. A growing line of work treats depth as discrete time and the residual stream as a dynamical system, where each layer's nonlinear update has a local linear description. However, previous analyses have relied on scalar summaries or approximate linearizations, leaving the full spectral geometry of trained LLMs unknown. We perform full Jacobian eigendecomposition across three production--scale LLMs and show that training installs a monotonic spectral gradient through depth -- from non-normal, rotation-dominated early layers to near--symmetric late layers -- together with a cumulative low-rank bottleneck that funnels perturbations into a small fraction of the residual stream's effective dimensions. Our experiments reveal that this gradient and the dimensional collapse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.