Understanding deep learning requires rethinking generalization

Chiyuan Zhang; Samy Bengio; Moritz Hardt; Benjamin Recht; Oriol; Vinyals

arXiv:1611.03530·cs.LG·February 28, 2017·1.1k cites

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol, Vinyals

PDF

Open Access 5 Repos 1 Datasets 2 Videos

TL;DR

This paper challenges traditional explanations for why large neural networks generalize well, showing they can fit random data and that expressivity alone accounts for their performance, prompting a rethink of generalization theory.

Contribution

It demonstrates that conventional explanations like regularization are insufficient, and introduces a theoretical perspective on neural network expressivity related to their size and data fitting capabilities.

Findings

01

Neural networks can perfectly fit random labels regardless of regularization.

02

Expressivity of neural networks exceeds data points once parameters surpass data size.

03

Traditional models cannot fully explain neural networks' generalization behavior.

Abstract

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

christopher/mnist1d
dataset· 7 dl
7 dl

Videos

'How neural networks learn' - Part III: Generalization and Overfitting· youtube

Gradient descent, how neural networks learn | Deep Learning Chapter 2· youtube

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference