Fractal Patterns May Illuminate the Success of Next-Token Prediction

Ibrahim Alabdulmohsin; Vinh Q. Tran; Mostafa Dehghani

arXiv:2402.01825·cs.CL·May 24, 2024·1 cites

Fractal Patterns May Illuminate the Success of Next-Token Prediction

Ibrahim Alabdulmohsin, Vinh Q. Tran, Mostafa Dehghani

PDF

Open Access

TL;DR

This paper reveals that language exhibits fractal, self-similar, long-range dependent structures, and demonstrates that these properties are consistent across domains and architectures, improving understanding of next-token prediction in language models.

Contribution

It formally characterizes the fractal nature of language, linking it to model performance and providing a new perspective on language structure and LLM success.

Findings

01

Language is self-similar and long-range dependent with H≈0.7.

02

Fractal parameters are robust across domains and architectures.

03

Variations in fractal parameters improve downstream performance prediction.

Abstract

We study the fractal structure of language, aiming to provide a precise formalism for quantifying properties that may have been previously suspected but not formally shown. We establish that language is: (1) self-similar, exhibiting complexities at all levels of granularity, with no particular characteristic context length, and (2) long-range dependent (LRD), with a Hurst parameter of approximately H=0.7. Based on these findings, we argue that short-term patterns/dependencies in language, such as in paragraphs, mirror the patterns/dependencies over larger scopes, like entire documents. This may shed some light on how next-token prediction can capture the structure of text across multiple levels of granularity, from words and clauses to broader contexts and intents. In addition, we carry out an extensive analysis across different domains and architectures, showing that fractal parameters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications