Less is More: Local Intrinsic Dimensions of Contextual Language Models

Benjamin Matthias Ruppik; Julius von Rohrscheidt; Carel van Niekerk; Michael Heck; Renato Vukovic; Shutong Feng; Hsien-chin Lin; Nurul Lubis; Bastian Rieck; Marcus Zibrowius; Milica Ga\v{s}i\'c

arXiv:2506.01034·cs.CL·October 28, 2025

Less is More: Local Intrinsic Dimensions of Contextual Language Models

Benjamin Matthias Ruppik, Julius von Rohrscheidt, Carel van Niekerk, Michael Heck, Renato Vukovic, Shutong Feng, Hsien-chin Lin, Nurul Lubis, Bastian Rieck, Marcus Zibrowius, Milica Ga\v{s}i\'c

PDF

Open Access 1 Video

TL;DR

This paper investigates how the geometric properties of contextual embeddings in language models, specifically local intrinsic dimensions, change during training and fine-tuning, providing insights into model behavior, overfitting, and generalization.

Contribution

It introduces a novel geometric perspective using local intrinsic dimensions to analyze training dynamics and fine-tuning effects in language models.

Findings

01

Local dimensions predict training exhaustion and overfitting.

02

Reductions in mean local dimension correlate with performance improvements.

03

The approach offers practical heuristics for model fine-tuning.

Abstract

Understanding the internal mechanisms of large language models (LLMs) remains a challenging and complex endeavor. Even fundamental questions, such as how fine-tuning affects model behavior, often require extensive empirical evaluation. In this paper, we introduce a novel perspective based on the geometric properties of contextual latent embeddings to study the effects of training and fine-tuning. To that end, we measure the local dimensions of a contextual language model's latent space and analyze their shifts during training and fine-tuning. We show that the local dimensions provide insights into the model's training dynamics and generalization ability. Specifically, the mean of the local dimensions predicts when the model's training capabilities are exhausted, as exemplified in a dialogue state tracking task, overfitting, as demonstrated in an emotion recognition task, and grokking,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Less is More: Local Intrinsic Dimensions of Contextual Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling