Loss-to-Loss Prediction: Scaling Laws for All Datasets
David Brandfonbrener, Nikhil Anand, Nikhil Vyas, Eran Malach, Sham, Kakade

TL;DR
This paper develops a method to predict loss across different datasets and tasks using simple shifted power law relationships, extending scaling law predictions beyond single datasets and improving accuracy in diverse settings.
Contribution
It introduces a strategy for predicting loss across datasets and tasks using shifted power laws, expanding the applicability of scaling laws in machine learning.
Findings
Predictions extrapolate well at 20x the largest FLOP used for fitting.
Simple shifted power laws relate train and test losses across datasets.
These relationships outperform single-dataset scaling laws in some cases.
Abstract
While scaling laws provide a reliable methodology for predicting train loss across compute scales for a single data distribution, less is known about how these predictions should change as we change the distribution. In this paper, we derive a strategy for predicting one loss from another and apply it to predict across different pre-training datasets and from pre-training data to downstream task data. Our predictions extrapolate well even at 20x the largest FLOP budget used to fit the curves. More precisely, we find that there are simple shifted power law relationships between (1) the train losses of two models trained on two separate datasets when the models are paired by training compute (train-to-train), (2) the train loss and the test loss on any downstream distribution for a single model (train-to-test), and (3) the test losses of two models trained on two separate train datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Modeling and Causal Inference · Machine Learning in Healthcare
