Evidence of Phase Transitions in Small Transformer-Based Language Models
Noah Hong, Tao Hong

TL;DR
This paper demonstrates that phase transitions, associated with emergent abilities in language models, can be observed in small transformer models early in training through vocabulary and statistical analysis, not just in large models or after log scaling.
Contribution
It introduces methods to detect phase transitions directly in linear training space and in small models, revealing early and generalizable reorganizations during language model training.
Findings
Phase transitions occur early in training of small models.
Vocabulary-based metrics reveal transitions not visible in loss curves.
Transitions are detectable directly in linear training space.
Abstract
Phase transitions have been proposed as the origin of emergent abilities in large language models (LLMs), where new capabilities appear abruptly once models surpass critical thresholds of scale. Prior work, such as that of Wei et al., demonstrated these phenomena under model and data scaling, with transitions revealed after applying a log scale to training compute. In this work, we ask three complementary questions: (1) Are phase transitions unique to large models, or can they also be observed in small transformer-based language models? (2) Can such transitions be detected directly in linear training space, rather than only after log rescaling? and (3) Can these transitions emerge at early stages of training? To investigate, we train a small GPT-style transformer on a character-level corpus and analyze the evolution of vocabulary usage throughout training. We track the average word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Topic Modeling · Computational and Text Analysis Methods
