Berezinskii--Kosterlitz--Thouless transition in a context-sensitive random language model
Yuma Toji, Jun Takahashi, Vwani Roychowdhury, and Hideyuki Miyahara

TL;DR
This paper introduces a simple probabilistic language model inspired by the Potts model, demonstrating a Berezinskii--Kosterlitz--Thouless phase transition in language structure, revealing potential universal critical properties in natural languages.
Contribution
It constructs a context-sensitive probabilistic language model that exhibits a BKT phase transition, linking language phenomena to statistical physics concepts.
Findings
Identifies a phase transition in a language model via an order parameter.
Shows the transition is of BKT type with critical properties across phases.
Suggests natural language criticality may stem from underlying physics principles.
Abstract
Several power-law critical properties involving different statistics in natural languages -- reminiscent of scaling properties of physical systems at or near phase transitions -- have been documented for decades. The recent rise of large language models has added further evidence and excitement by providing intriguing similarities with notions in physics such as scaling laws and emergent abilities. However, specific instances of classes of generative language models that exhibit phase transitions, as understood by the statistical physics community, are lacking. In this work, inspired by the one-dimensional Potts model in statistical physics, we construct a simple probabilistic language model that falls under the class of context-sensitive grammars, which we call the context-sensitive random language model, and numerically demonstrate an unambiguous phase transition in the framework of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
