4+3 Phases of Compute-Optimal Neural Scaling Laws

Elliot Paquette; Courtney Paquette; Lechao Xiao; Jeffrey Pennington

arXiv:2405.15074·stat.ML·April 22, 2025·1 cites

4+3 Phases of Compute-Optimal Neural Scaling Laws

Elliot Paquette, Courtney Paquette, Lechao Xiao, Jeffrey Pennington

PDF

Open Access

TL;DR

This paper introduces a detailed neural scaling model with four main phases and three subphases, providing new theoretical predictions for compute-optimal neural network sizes based on data and target complexities.

Contribution

It develops a solvable neural scaling model that predicts compute-optimal model sizes and phase boundaries, supported by mathematical proofs and extensive numerical validation.

Findings

01

Identifies 4 main phases and 3 subphases in neural scaling behavior.

02

Derives scaling-law exponents for all phases and subphases.

03

Provides formulas for optimal model-parameter-count as a function of compute budget.

Abstract

We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent on a mean-squared loss. We derive a representation of the loss curves which holds over all iteration counts and improves in accuracy as the model parameter count grows. We then analyze the compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in the data-complexity/target-complexity phase-plane. The phase boundaries are determined by the relative importance of model capacity, optimizer noise, and embedding of the features. We furthermore derive, with mathematical proof and extensive numerical evidence, the scaling-law exponents in all of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications