Information-Theoretic Foundations for Neural Scaling Laws
Hong Jun Jeon, Benjamin Van Roy

TL;DR
This paper establishes rigorous information-theoretic foundations for neural scaling laws, clarifying how model and data sizes relate to error, especially for data generated by infinite-width neural networks, aligning with empirical observations.
Contribution
It develops a formal theoretical framework for neural scaling laws, addressing previous lack of rigor and clarifying the relationship between data and model size.
Findings
Optimal data-model size relation is linear up to logarithmic factors.
Results corroborate large-scale empirical findings.
Provides general, concise theoretical insights into neural scaling laws.
Abstract
Neural scaling laws aim to characterize how out-of-sample error behaves as a function of model and training dataset size. Such scaling laws guide allocation of a computational resources between model and data processing to minimize error. However, existing theoretical support for neural scaling laws lacks rigor and clarity, entangling the roles of information and optimization. In this work, we develop rigorous information-theoretic foundations for neural scaling laws. This allows us to characterize scaling laws for data generated by a two-layer neural network of infinite width. We observe that the optimal relation between data and model size is linear, up to logarithmic factors, corroborating large-scale empirical investigations. Concise yet general results of the kind we establish may bring clarity to this topic and inform future investigations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
