Deep learning: a statistical viewpoint
Peter L. Bartlett, Andrea Montanari, Alexander Rakhlin

TL;DR
This paper reviews recent theoretical progress explaining deep learning's success, focusing on overparametrization, implicit regularization, and benign overfitting, especially in linear regimes and simple models.
Contribution
It synthesizes recent findings on how overparametrization and implicit regularization enable gradient methods to achieve excellent predictive accuracy despite overfitting.
Findings
Gradient methods find near-optimal solutions in non-convex problems.
Implicit regularization leads to minimal norm solutions fitting training data.
Benign overfitting occurs in overparametrized models without harming prediction accuracy.
Abstract
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
