Less is More: Local Intrinsic Dimensions of Contextual Language Models
Benjamin Matthias Ruppik, Julius von Rohrscheidt, Carel van Niekerk, Michael Heck, Renato Vukovic, Shutong Feng, Hsien-chin Lin, Nurul Lubis, Bastian Rieck, Marcus Zibrowius, Milica Ga\v{s}i\'c

TL;DR
This paper investigates how the geometric properties of contextual embeddings in language models, specifically local intrinsic dimensions, change during training and fine-tuning, providing insights into model behavior, overfitting, and generalization.
Contribution
It introduces a novel geometric perspective using local intrinsic dimensions to analyze training dynamics and fine-tuning effects in language models.
Findings
Local dimensions predict training exhaustion and overfitting.
Reductions in mean local dimension correlate with performance improvements.
The approach offers practical heuristics for model fine-tuning.
Abstract
Understanding the internal mechanisms of large language models (LLMs) remains a challenging and complex endeavor. Even fundamental questions, such as how fine-tuning affects model behavior, often require extensive empirical evaluation. In this paper, we introduce a novel perspective based on the geometric properties of contextual latent embeddings to study the effects of training and fine-tuning. To that end, we measure the local dimensions of a contextual language model's latent space and analyze their shifts during training and fine-tuning. We show that the local dimensions provide insights into the model's training dynamics and generalization ability. Specifically, the mean of the local dimensions predicts when the model's training capabilities are exhausted, as exemplified in a dialogue state tracking task, overfitting, as demonstrated in an emotion recognition task, and grokking,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
