A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?
Ibrahim Alabdulmohsin, Andreas Steiner

TL;DR
This paper explores whether large language models can replicate the fractal complexity of natural language, examining their ability to mimic self-similarity and long-range dependence, and proposes fractal parameters as potential indicators of generated texts.
Contribution
The study systematically analyzes LLMs' capacity to reproduce fractal properties of language and introduces a new dataset of LLM-generated and human texts for evaluation.
Findings
LLMs' fractal parameters vary widely compared to natural language.
Fractal parameters can help detect LLM-generated texts.
Robustness of findings across different LLM architectures.
Abstract
Language exhibits a fractal structure in its information-theoretic complexity (i.e. bits per token), with self-similarity across scales and long-range dependence (LRD). In this work, we investigate whether large language models (LLMs) can replicate such fractal characteristics and identify conditions-such as temperature setting and prompting method-under which they may fail. Moreover, we find that the fractal parameters observed in natural language are contained within a narrow range, whereas those of LLMs' output vary widely, suggesting that fractal parameters might prove helpful in detecting a non-trivial portion of LLM-generated texts. Notably, these findings, and many others reported in this work, are robust to the choice of the architecture; e.g. Gemini 1.0 Pro, Mistral-7B and Gemma-2B. We also release a dataset comprising of over 240,000 articles generated by various LLMs (both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
