Scaling Laws for Optimal Data Mixtures
Mustafa Shukor, Louis Bethune, Dan Busbridge, David Grangier, Enrico Fini, Alaaeldin El-Nouby, Pierre Ablin

TL;DR
This paper introduces a universal scaling law framework that predicts model performance based on data mixture and size, enabling optimal domain weight selection without extensive trial-and-error in large-scale model training.
Contribution
It develops a systematic, predictive approach to determine optimal data mixtures for training large models across multiple domains using scaling laws.
Findings
Scaling laws accurately predict loss across different model sizes and data mixtures.
The laws generalize to new data mixtures and scales with minimal small-scale training.
Optimal domain weights can be derived efficiently, reducing trial-and-error in data selection.
Abstract
Large foundation models are typically trained on data from multiple domains, with the data mixture--the proportion of each domain used--playing a critical role in model performance. The standard approach to selecting this mixture relies on trial and error, which becomes impractical for large-scale pretraining. We propose a systematic method to determine the optimal data mixture for any target domain using scaling laws. Our approach accurately predicts the loss of a model of size trained with tokens and a specific domain weight vector . We validate the universality of these scaling laws by demonstrating their predictive power in three distinct and large-scale settings: large language model (LLM), native multimodal model (NMM), and large vision models (LVM) pretraining. We further show that these scaling laws can extrapolate to new data mixtures and across scales: their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
