Random Scaling of Emergent Capabilities

Rosie Zhao; Tian Qin; David Alvarez-Melis; Sham Kakade; Naomi Saphra

arXiv:2502.17356·cs.LG·February 19, 2026

Random Scaling of Emergent Capabilities

Rosie Zhao, Tian Qin, David Alvarez-Melis, Sham Kakade, Naomi Saphra

PDF

Open Access 3 Reviews

TL;DR

This paper investigates how emergent capabilities in language models are driven by continuous changes in training outcome distributions across random seeds, rather than abrupt threshold effects, revealing the importance of seed variability in performance scaling.

Contribution

It demonstrates that emergent capabilities result from gradual distribution shifts across seeds, challenging the notion of sudden threshold-based breakthroughs in model scaling.

Findings

01

Emergent performance is linked to bimodal distribution shifts across seeds.

02

Sharp metric breakthroughs are due to underlying distribution changes, not thresholds.

03

Seed variability significantly influences performance predictions at scale.

Abstract

Language models famously improve under a smooth scaling law, but some specific capabilities exhibit sudden breakthroughs in performance. Advocates of "emergence" view these capabilities as unlocked at a specific scale, but others attribute breakthroughs to superficial metric thresholding effects. We propose that breakthroughs are instead driven by continuous changes in the probability distribution of training outcomes when performance is bimodally distributed across random seeds. we show that different random seeds can produce either smooth or emergent scaling trends in synthetic length generalization tasks, multiple choice question answering, and grammatical generalization. We reveal that sharp breakthroughs in metrics are produced by underlying continuous changes in their distribution across seeds. These distributions may become abruptly bimodal at a capacity threshold but this…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 2

Strengths

- The paper provides a careful conceptual lens through which to view the well-studied phenomenon of emergent capabilities. This viewpoint is empirically well-supported. - Experiments are extensive, showing the robustness of results for a wide range of metrics (continuous vs discrete, mode vs mean) across seeds and datasets (synthetic and real-world).

Weaknesses

- Potential for impact: Although the finding that not all individual seeds themselves exhibit non-linearity in emergent capabilities is interesting, it is not clear what the impact of the empirical findings in the work are. If what appears as emergence is that the mode of the performance distributions sharply increases, is this not a form of emergence? What are the implications of this work for how we study and evaluate models? - Some analysis decisions are arbitrary: For example, why is 20% exa

Reviewer 02Rating 2Confidence 3

Strengths

The paper studies emergent capabilities from a novel perspective, from a distributional perspective of many training runs instead of a single training run. This is important and helpful because neural network learning is inherently stochastic.

Weaknesses

1. I could not agree with the paper's explanation that emergent capabilities are driven by the binomial distribution in capabilities, that "This variability is precisely what causes some model runs to appear as breakthroughs while others follow a more linear progression." I believe the causality should be the other way around. Some training runs show breakthrough, so that the capability performance improves abruptly from one mode to another. And other training runs show linear improvement. When

Reviewer 03Rating 4Confidence 3

Strengths

1. The proposed observation of bimodal distribution is interesting and makes sense as a potential explanation for emergent abilities. 2. The emergence from unimodal to bimodal distribution as a sign of possessing minimum capability is an interesting and well-explained observation. 3. The paper is clear and easy to follow.

Weaknesses

1. I would suggest changing the title of Section 2 from "Experiment" to "Experimental Setup." You only introduce the setup there. 2. Typos in lines 246-247: "we see that the probability (Figure 3.2 (bottom left) and mean (bottom right) of such “successful”." Throughout the paper, you seem to regard Figures 3 and 6, which have 4 subfigures, as being displayed as a 2*2 layout. 3. In lines 359 & 368, Figure 3.5 is mislinked to Figure 6. 4. In line 414, incorrect citation format. ("...process the mu

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making