Fractal Patterns May Illuminate the Success of Next-Token Prediction
Ibrahim Alabdulmohsin, Vinh Q. Tran, Mostafa Dehghani

TL;DR
This paper reveals that language exhibits fractal, self-similar, long-range dependent structures, and demonstrates that these properties are consistent across domains and architectures, improving understanding of next-token prediction in language models.
Contribution
It formally characterizes the fractal nature of language, linking it to model performance and providing a new perspective on language structure and LLM success.
Findings
Language is self-similar and long-range dependent with H≈0.7.
Fractal parameters are robust across domains and architectures.
Variations in fractal parameters improve downstream performance prediction.
Abstract
We study the fractal structure of language, aiming to provide a precise formalism for quantifying properties that may have been previously suspected but not formally shown. We establish that language is: (1) self-similar, exhibiting complexities at all levels of granularity, with no particular characteristic context length, and (2) long-range dependent (LRD), with a Hurst parameter of approximately H=0.7. Based on these findings, we argue that short-term patterns/dependencies in language, such as in paragraphs, mirror the patterns/dependencies over larger scopes, like entire documents. This may shed some light on how next-token prediction can capture the structure of text across multiple levels of granularity, from words and clauses to broader contexts and intents. In addition, we carry out an extensive analysis across different domains and architectures, showing that fractal parameters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
