Turbulence-like 5/3 spectral scaling in contextual representations of language as a complex system
Zhongxin Yang, Chun Bao, Yuanwei Bin, Xiang I.A. Yang, Shiyi Chen

TL;DR
This study reveals a consistent turbulence-like 5/3 spectral scaling in high-dimensional contextual language embeddings, indicating scale-free, self-similar organization of semantic information across linguistic scales.
Contribution
It demonstrates a universal power law in transformer-based embeddings across languages and texts, linking language structure to turbulence phenomena.
Findings
Power spectrum exhibits a robust 5/3 power law across multiple languages and corpora.
Scaling is present in contextual embeddings but absent in static word embeddings.
Randomizing token order disrupts the observed spectral scaling.
Abstract
Natural language is a complex system that exhibits robust statistical regularities. Here, we represent text as a trajectory in a high-dimensional embedding space generated by transformer-based language models, and quantify scale-dependent fluctuations along the token sequence using an embedding-step signal. Across multiple languages and corpora, the resulting power spectrum exhibits a robust power law with an exponent close to over an extended frequency range. This scaling is observed consistently in contextual embeddings from both human-written and AI-generated text, but is absent in static word embeddings and is disrupted by randomization of token order. These results show that the observed scaling reflects multiscale, context-dependent organization rather than lexical statistics alone. By analogy with the Kolmogorov spectrum in turbulence, our findings suggest that semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
