High-dimensional dynamics of generalization error in neural networks

Madhu S. Advani; Andrew M. Saxe

arXiv:1710.03667·stat.ML·October 11, 2017

High-dimensional dynamics of generalization error in neural networks

Madhu S. Advani, Andrew M. Saxe

PDF

TL;DR

This paper analyzes the high-dimensional generalization dynamics of neural networks trained with gradient descent, revealing how network size and initial weights influence overfitting and generalization, with theoretical and empirical insights.

Contribution

It introduces a novel high-dimensional analysis of neural network generalization, identifying phenomena like frozen subspaces and input conditioning that explain overtraining behavior.

Findings

01

Large networks can reduce overtraining without regularization.

02

Overtraining peaks when the number of parameters matches the dataset size.

03

Small initial weights are crucial for good generalization in high-dimensional regimes.

Abstract

We perform an average case analysis of the generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsEarly Stopping