Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

Xinran Gu; Kaifeng Lyu; Jiazheng Li; Jingzhao Zhang

arXiv:2505.18091·cs.LG·May 12, 2026

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

Xinran Gu, Kaifeng Lyu, Jiazheng Li, Jingzhao Zhang

PDF

1 Video

TL;DR

This paper reveals that training large language models on mixed datasets can cause abrupt phase transitions in knowledge acquisition, depending on model size and data mixing ratios, due to capacity allocation phenomena.

Contribution

It introduces the concept of phase transitions in LLM training on data mixtures, supported by a theoretical framework and controlled experiments demonstrating these effects.

Findings

01

Models exhibit sudden knowledge acquisition jumps at critical sizes.

02

Below a certain data mixing ratio, models memorize little; above it, they memorize rapidly.

03

Critical mixing ratios follow a power-law relationship with model size.

Abstract

Large Language Models (LLMs) are typically trained on data mixtures: most data come from web scrapes, while a small portion is curated from high-quality sources with dense domain-specific knowledge. In this paper, we show that when training LLMs on such data mixtures, knowledge acquisition from knowledge-dense datasets, unlike training exclusively on knowledge-dense data (arXiv:2404.05405), does not always follow a smooth scaling law but can exhibit phase transitions with respect to the mixing ratio and model size. Through controlled experiments on a synthetic biography dataset mixed with web-scraped data, we demonstrate that: (1) as we increase the model size to a critical value, the model suddenly transitions from memorizing very few to most of the biographies; (2) below a critical mixing ratio, the model memorizes almost nothing even with extensive training, but beyond this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition· slideslive