Practical Scaling Laws: Converting Compute into Performance in a Data-Constrained World
Christopher M. Bryant, Hao Liu

TL;DR
This paper introduces a new scaling law model that accurately predicts model performance across various data regimes, addressing limitations of previous models and enabling cost-effective training strategies.
Contribution
It proposes a closed-form extension of existing scaling laws that accounts for overfitting, data scarcity, and multiple epochs, validated across diverse architectures and domains.
Findings
The new model outperforms previous laws in extrapolation accuracy.
It fits well to multiple published LLM scaling-law datasets.
The model enables cost-aware training optimization.
Abstract
The scaling laws guiding modern model training were calibrated for a single regime: data-rich, single-epoch pretraining. The dominant such scaling law form, Chinchilla's , has three structural limitations outside that regime: it diverges as unique data shrinks instead of saturating at the uninformed baseline; it cannot represent overfitting when capacity exceeds the data; and it conflates total examples seen with unique examples available. We propose a closed-form extension, with , that decomposes loss into undercapacity, undertraining, and overfitting terms. It saturates between the irreducible loss and an uninformed baseline fixed by the loss type, and reduces to Chinchilla in the data-rich, single-epoch limit. We validate it on four multi-epoch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
