The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions
Jaros{\l}aw Hryszko

TL;DR
This paper introduces Entropic Deviation (ED) to quantify the intrinsic non-randomness in language model token distributions, revealing fundamental limits and differences across architectures and languages.
Contribution
It systematically measures the intrinsic non-randomness in language models using ED, establishing a structural lower bound and comparing transformer and state space models.
Findings
Transformers exhibit an ED of about 0.30 under neutral prompts.
State space model (Mamba2) shows higher ED and greater temperature sensitivity.
Language modulates the intrinsic non-randomness independently of tokenisation.
Abstract
Language models cannot be random. This paper introduces Entropic Deviation (ED), the normalised KL divergence between a model's token distribution and the uniform distribution, and measures it systematically across 31,200 generations spanning seven models, two architectures (transformer and state space), nine prompt categories, three temperatures, and five languages. Under semantically neutral prompts (empty strings, random characters, nonsense syllables) transformers still exhibit ED of approximately 0.30, meaning that 88-93% of the non-randomness observed under semantic prompts is intrinsic to the learned weights rather than induced by context. Three transformer families (Gemma, Llama, Qwen) converge on nearly identical ED values despite different training data and vocabularies. A state space model (Mamba2) reveals a qualitatively different regime: twice the ED, three times lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
