The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models
Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan, Oseledets, Denis Dimitrov, Andrey Kuznetsov

TL;DR
This paper explores the anisotropy and intrinsic dimensions of transformer embeddings, revealing distinct layer-wise patterns and training dynamics that enhance understanding of encoder and decoder behaviors.
Contribution
It provides novel insights into the anisotropy profiles and intrinsic dimension evolution in transformer models, highlighting differences between encoders and decoders.
Findings
Decoders show a bell-shaped anisotropy profile with middle-layer peaks.
Intrinsic dimension increases early in training, then decreases as training progresses.
Distinct anisotropy patterns differentiate encoder and decoder embeddings.
Abstract
In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural dynamics and brain function
