On the Predictive Power of Representation Dispersion in Language Models

Yanhong Li; Ming Li; Karen Livescu; Jiawei Zhou

arXiv:2506.24106·cs.CL·April 21, 2026

On the Predictive Power of Representation Dispersion in Language Models

Yanhong Li, Ming Li, Karen Livescu, Jiawei Zhou

PDF

1 Repo 1 Video

TL;DR

This paper demonstrates that the dispersion of a language model's embeddings correlates with its predictive accuracy, and shows how this insight can be used for model evaluation, retrieval, and training improvements.

Contribution

It introduces the concept of representation dispersion as a predictor of model performance and practical utility, with methods to measure, leverage, and enhance dispersion.

Findings

01

Higher dispersion correlates with lower perplexity across models and domains.

02

Dispersion measurement helps rank example difficulty and identify hard data slices.

03

Increasing dispersion through training improves model perplexity.

Abstract

We show that a language model's ability to predict text is tightly linked to the breadth of its embedding space: models that spread their contextual representations more widely tend to achieve lower perplexity. Concretely, we find that representation dispersion--the average pairwise cosine distance among hidden vectors--strongly and negatively correlates with perplexity across diverse model families (LLaMA, Qwen, and others) and domains (Wikipedia, news, scientific abstracts). Beyond illustrating this link, we show how dispersion can be leveraged for a range of practical tasks--without requiring labeled data. First, measuring dispersion on unlabeled text allows us to rank examples by difficulty and identify hard slices in new domains, offering a data-efficient tool for screening and prioritizing models before full evaluation. Next, we find that identifying layers with higher dispersion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yanhong-lbh/rep_dispersion
github

Videos

On the Predictive Power of Representation Dispersion in Language Models· slideslive