A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?

Ibrahim Alabdulmohsin; Andreas Steiner

arXiv:2502.14924·cs.CL·May 27, 2025

A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?

Ibrahim Alabdulmohsin, Andreas Steiner

PDF

1 Datasets

TL;DR

This paper explores whether large language models can replicate the fractal complexity of natural language, examining their ability to mimic self-similarity and long-range dependence, and proposes fractal parameters as potential indicators of generated texts.

Contribution

The study systematically analyzes LLMs' capacity to reproduce fractal properties of language and introduces a new dataset of LLM-generated and human texts for evaluation.

Findings

01

LLMs' fractal parameters vary widely compared to natural language.

02

Fractal parameters can help detect LLM-generated texts.

03

Robustness of findings across different LLM architectures.

Abstract

Language exhibits a fractal structure in its information-theoretic complexity (i.e. bits per token), with self-similarity across scales and long-range dependence (LRD). In this work, we investigate whether large language models (LLMs) can replicate such fractal characteristics and identify conditions-such as temperature setting and prompting method-under which they may fail. Moreover, we find that the fractal parameters observed in natural language are contained within a narrow range, whereas those of LLMs' output vary widely, suggesting that fractal parameters might prove helpful in detecting a non-trivial portion of LLM-generated texts. Notably, these findings, and many others reported in this work, are robust to the choice of the architecture; e.g. Gemini 1.0 Pro, Mistral-7B and Gemma-2B. We also release a dataset comprising of over 240,000 articles generated by various LLMs (both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ibomohsin/gagle
dataset· 1.2k dl
1.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.