Explaining Neural Scaling Laws

Yasaman Bahri; Ethan Dyer; Jared Kaplan; Jaehoon Lee; Utkarsh Sharma

arXiv:2102.06701·cs.LG·June 28, 2024·32 cites

Explaining Neural Scaling Laws

Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma

PDF

Open Access

TL;DR

This paper develops a theoretical framework explaining the power-law scaling laws observed in neural network performance, identifying four regimes based on dataset and model size, supported by empirical evidence across various architectures.

Contribution

It introduces a unified theory connecting variance-limited and resolution-limited scaling regimes, providing a taxonomy and insights into the microscopic origins of scaling laws in neural networks.

Findings

01

Identifies four distinct scaling regimes in neural networks.

02

Establishes a duality between width and dataset resolution limits.

03

Empirically validates theoretical predictions across multiple architectures.

Abstract

The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)