Explaining Neural Scaling Laws
Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma

TL;DR
This paper develops a theoretical framework explaining the power-law scaling laws observed in neural network performance, identifying four regimes based on dataset and model size, supported by empirical evidence across various architectures.
Contribution
It introduces a unified theory connecting variance-limited and resolution-limited scaling regimes, providing a taxonomy and insights into the microscopic origins of scaling laws in neural networks.
Findings
Identifies four distinct scaling regimes in neural networks.
Establishes a duality between width and dataset resolution limits.
Empirically validates theoretical predictions across multiple architectures.
Abstract
The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
