Understanding deep learning requires rethinking generalization
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol, Vinyals

TL;DR
This paper challenges traditional explanations for why large neural networks generalize well, showing they can fit random data and that expressivity alone accounts for their performance, prompting a rethink of generalization theory.
Contribution
It demonstrates that conventional explanations like regularization are insufficient, and introduces a theoretical perspective on neural network expressivity related to their size and data fitting capabilities.
Findings
Neural networks can perfectly fit random labels regardless of regularization.
Expressivity of neural networks exceeds data points once parameters surpass data size.
Traditional models cannot fully explain neural networks' generalization behavior.
Abstract
Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
'How neural networks learn' - Part III: Generalization and Overfitting· youtube
Gradient descent, how neural networks learn | Deep Learning Chapter 2· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference
