4+3 Phases of Compute-Optimal Neural Scaling Laws
Elliot Paquette, Courtney Paquette, Lechao Xiao, Jeffrey Pennington

TL;DR
This paper introduces a detailed neural scaling model with four main phases and three subphases, providing new theoretical predictions for compute-optimal neural network sizes based on data and target complexities.
Contribution
It develops a solvable neural scaling model that predicts compute-optimal model sizes and phase boundaries, supported by mathematical proofs and extensive numerical validation.
Findings
Identifies 4 main phases and 3 subphases in neural scaling behavior.
Derives scaling-law exponents for all phases and subphases.
Provides formulas for optimal model-parameter-count as a function of compute budget.
Abstract
We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent on a mean-squared loss. We derive a representation of the loss curves which holds over all iteration counts and improves in accuracy as the model parameter count grows. We then analyze the compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in the data-complexity/target-complexity phase-plane. The phase boundaries are determined by the relative importance of model capacity, optimizer noise, and embedding of the features. We furthermore derive, with mathematical proof and extensive numerical evidence, the scaling-law exponents in all of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
