Revisiting Anisotropy in Language Transformers: The Geometry of Learning Dynamics
Raphael Bernas, Fanny Jourdan, Antonin Poch\'e, C\'eline Hudelot

TL;DR
This paper investigates the geometric causes of anisotropy in language transformers, combining theoretical insights with empirical analysis using concept-based interpretability to understand learning dynamics.
Contribution
It extends theoretical understanding of anisotropy by linking it to frequency-biased sampling and tangent directions, supported by empirical validation using activation-derived tangent proxies.
Findings
Activation-derived tangent directions capture large gradient energy.
These directions account for a larger share of gradient anisotropy.
Empirical results support a tangent-aligned explanation of anisotropy.
Abstract
Since their introduction, Transformer architectures have dominated Natural Language Processing (NLP). However, recent research has highlighted an inherent anisotropy phenomenon in these models, presenting a significant challenge to their geometric interpretation. Previous theoretical studies on this phenomenon are rarely grounded in the underlying representation geometry. In this paper, we extend them by deriving geometric arguments for how frequency-biased sampling attenuates curvature visibility and why training preferentially amplify tangent directions. Empirically, we then use concept-based mechanistic interpretability during training, rather than only post hoc, to fit activation-derived low-rank tangent proxies and test them against ordinary backpropagated true gradients. Across encoder-style and decoder-style language models, we find that these activation-derived directions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
