Tracing the complexity profiles of different linguistic phenomena through the intrinsic dimension of LLM representations

Marco Baroni; Emily Cheng; Iria de-Dios-Flores; Francesca Franzon

arXiv:2601.03779·cs.CL·April 27, 2026

Tracing the complexity profiles of different linguistic phenomena through the intrinsic dimension of LLM representations

Marco Baroni, Emily Cheng, Iria de-Dios-Flores, Francesca Franzon

PDF

TL;DR

This paper investigates how the intrinsic dimension of language model representations correlates with linguistic complexity, revealing consistent patterns across models and phenomena.

Contribution

It demonstrates that intrinsic dimension differences reflect linguistic complexity contrasts and vary across layers, providing a new marker for analyzing LLMs.

Findings

01

ID differences align with known linguistic complexity contrasts

02

ID peaks occur at different layers for different phenomena

03

Representational similarity and pruning validate ID trends

Abstract

We explore intrinsic dimension (ID) of LLM representations as a marker of linguistic complexity. Specifically, we test whether ID differences across model layers reflect well-known complexity contrasts established in (psycho)linguistics: coordination vs. subordination, right-branching vs. center-embedding, and unambiguous vs. ambiguous attachment. Our results on six different LLMs show that these contrasts are consistently reflected in ID differences, with more complex phenomena eliciting higher ID profiles. Notably, ID differences emerge at different points across layers for different contrasts, also reaching their peaks at different stages. Further experiments using representational similarity and layer pruning confirm the trends. We conclude that ID is a useful marker of linguistic complexity in LLMs, that it points to similar linguistic processing steps across disparate LLMs, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.