Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng, Diego Doimo, Corentin Kervadec, Iuri Macocco and, Jade Yu, Alessandro Laio, Marco Baroni

TL;DR
This paper identifies a high-dimensional abstraction phase in language transformers, where representations become linguistically meaningful, transferable, and predictive across models, correlating with improved language modeling performance.
Contribution
It introduces a geometric analysis revealing a high-dimensional phase in transformer LMs linked to linguistic abstraction and transferability, a novel insight into their internal representations.
Findings
High intrinsic dimensionality characterizes the abstraction phase.
Representations in this phase transfer across models and tasks.
Earlier phase onset predicts better language modeling performance.
Abstract
A language model (LM) is a mapping from a linguistic context to an output token. However, much remains to be known about this mapping, including how its geometric properties relate to its function. We take a high-level geometric approach to its analysis, observing, across five pre-trained transformer-based LMs and three input datasets, a distinct phase characterized by high intrinsic dimensionality. During this phase, representations (1) correspond to the first full linguistic abstraction of the input; (2) are the first to viably transfer to downstream tasks; (3) predict each other across different LMs. Moreover, we find that an earlier onset of the phase strongly predicts better language modelling performance. In short, our results suggest that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Linguistics and Cultural Studies · Natural Language Processing Techniques
