Correlation Dimension of Auto-Regressive Large Language Models

Xin Du; Kumiko Tanaka-Ishii

arXiv:2510.21258·cs.CL·October 27, 2025

Correlation Dimension of Auto-Regressive Large Language Models

Xin Du, Kumiko Tanaka-Ishii

PDF

1 Video

TL;DR

This paper introduces correlation dimension, a fractal measure to quantify the structural complexity of language models, revealing phases of training, hallucination tendencies, and degeneration in generated text.

Contribution

It proposes a novel fractal-geometric metric for analyzing LLMs, capturing hierarchical recurrence and long-range structure beyond traditional accuracy metrics.

Findings

01

Correlation dimension reveals three training phases.

02

It reflects context-dependent complexity.

03

It detects hallucinations and degeneration.

Abstract

Large language models (LLMs) have achieved remarkable progress in natural language generation, yet they continue to display puzzling behaviors -- such as repetition and incoherence -- even when exhibiting low perplexity. This highlights a key limitation of conventional evaluation metrics, which emphasize local prediction accuracy while overlooking long-range structural complexity. We introduce correlation dimension, a fractal-geometric measure of self-similarity, to quantify the epistemological complexity of text as perceived by a language model. This measure captures the hierarchical recurrence structure of language, bridging local and global properties in a unified framework. Through extensive experiments, we show that correlation dimension (1) reveals three distinct phases during pretraining, (2) reflects context-dependent complexity, (3) indicates a model's tendency toward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Correlation Dimension of Autoregressive Large Language Models· slideslive