A Neural Scaling Law from the Dimension of the Data Manifold

Utkarsh Sharma; Jared Kaplan

arXiv:2004.10802·cs.LG·April 24, 2020·20 cites

A Neural Scaling Law from the Dimension of the Data Manifold

Utkarsh Sharma, Jared Kaplan

PDF

Open Access

TL;DR

This paper reveals a power-law scaling law for neural network loss as a function of model size, linked to the intrinsic data manifold dimension, supported by experiments across models and data types.

Contribution

It introduces a theoretical explanation connecting neural scaling laws to data manifold dimension and validates it through diverse empirical experiments.

Findings

01

Loss scales as a power-law with model size, L ∝ N^(-α).

02

Scaling exponent α is approximately 4 divided by the data's intrinsic dimension d.

03

The theory is confirmed across different models and data modalities.

Abstract

When data is plentiful, the loss achieved by well-trained neural networks scales as a power-law $L \propto N^{- α}$ in the number of network parameters $N$ . This empirical scaling law holds for a wide variety of data modalities, and may persist over many orders of magnitude. The scaling law can be explained if neural models are effectively just performing regression on a data manifold of intrinsic dimension $d$ . This simple theory predicts that the scaling exponents $α \approx 4/ d$ for cross-entropy and mean-squared error losses. We confirm the theory by independently measuring the intrinsic dimension and the scaling exponents in a teacher/student framework, where we can study a variety of $d$ and $α$ by dialing the properties of random teacher networks. We also test the theory with CNN image classifiers on several datasets and with GPT-type language models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image Processing and 3D Reconstruction · Face and Expression Recognition