The Shape of Learning: Anisotropy and Intrinsic Dimensions in   Transformer-Based Models

Anton Razzhigaev; Matvey Mikhalchuk; Elizaveta Goncharova; Ivan; Oseledets; Denis Dimitrov; Andrey Kuznetsov

arXiv:2311.05928·cs.CL·February 27, 2024·1 cites

The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan, Oseledets, Denis Dimitrov, Andrey Kuznetsov

PDF

Open Access

TL;DR

This paper explores the anisotropy and intrinsic dimensions of transformer embeddings, revealing distinct layer-wise patterns and training dynamics that enhance understanding of encoder and decoder behaviors.

Contribution

It provides novel insights into the anisotropy profiles and intrinsic dimension evolution in transformer models, highlighting differences between encoders and decoders.

Findings

01

Decoders show a bell-shaped anisotropy profile with middle-layer peaks.

02

Intrinsic dimension increases early in training, then decreases as training progresses.

03

Distinct anisotropy patterns differentiate encoder and decoder embeddings.

Abstract

In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural dynamics and brain function