Critical Phase Transition in Large Language Models
Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima

TL;DR
This paper investigates whether large language models exhibit phase transitions, revealing that at certain temperature settings, their behavior changes qualitatively, akin to physical phase transitions, with implications for understanding their dynamics.
Contribution
The study provides the first analysis of phase transition phenomena in LLMs, identifying critical points and behaviors similar to natural phase transitions in physical systems.
Findings
Divergent statistical quantities at specific temperature points
Power-law decay of correlations near the transition
Slow convergence to stationary states in LLMs
Abstract
Large Language Models (LLMs) have demonstrated impressive performance. To understand their behaviors, we need to consider the fact that LLMs sometimes show qualitative changes. The natural world also presents such changes called phase transitions, which are defined by singular, divergent statistical quantities. Therefore, an intriguing question is whether qualitative changes in LLMs are phase transitions. In this work, we have conducted extensive analysis on texts generated by LLMs and suggested that a phase transition occurs in LLMs when varying the temperature parameter. Specifically, statistical quantities have divergent properties just at the point between the low-temperature regime, where LLMs generate sentences with clear repetitive structures, and the high-temperature regime, where generated sentences are often incomprehensible. In addition, critical behaviors near the phase…
Peer Reviews
Decision·Submitted to ICLR 2025
* The motivation behind the paper is nice: the application of a well-studied concept from physics can perhaps allow us to use knowledge about/properties of that concept to better understand natural language and language models * The work provides a comparison of quantitative aspects of human- and machine-generated language, an approach that is more objective than the qualitative comparisons that are often done and thus perhaps a better ground from which to draw conclusions * The finding that th
* The paper is generally difficult to follow: * There is confusing terminology that isnt defined/contextualized before it is used, e.g., “long-range correlation” in the introduction). This will likely confuse most readers (myself included) * The implications of the observed critical properties are incredibly unclear. See subsequent questions for the parts that felt particularly unclear to me, although this is not comprehensive. As such, it is difficult to draw meaningful conclusions from
This paper investigates whether LLMs go through phase transition with the temperature parameter. This (I believe) contrasts with much of the literature, which is about the possibility of a phase transition with the size of the model. The statistics that the authors introduce may have value for future work that aims to study structural features of LLM generations. The writing and organization are clear, and the figures are clearly explained.
- There are obvious empirical concerns: namely, in the main paper, the authors only study GPT-2 small, and all of the analysis is specifically about the structure of where the proper noun PROPN tag occurs in generated text. This must be quite narrow, and it would be more convincing if the authors could summarize the results for other models and other POS tags in the main paper (they are referenced as being in the Appendix, which I did not read). - This is very far from my area of expertise, but
1. The project fosters the understanding of LLMs from an interesting angle of phase transition. 2. The experiments are well designed and cleared documented.
1. While LLMs develop fast over the past few years, this work tests on GPT-2, instead of newer generations of models.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Layer Normalization · Byte Pair Encoding · Adam · Attention Dropout · Weight Decay · Linear Warmup With Cosine Annealing · Linear Layer
