Explaining generalization in deep learning: progress and fundamental limits
Vaishnavh Nagarajan

TL;DR
This paper investigates why deep networks generalize well despite overparameterization, analyzing uniform convergence limits, and proposing an empirical method using unlabeled data to estimate generalization.
Contribution
It provides empirical insights into implicit capacity control, derives improved data-dependent bounds, and introduces a novel empirical technique for estimating generalization without relying solely on uniform convergence.
Findings
Uniform convergence bounds can be vacuous in overparameterized settings.
Training via stochastic gradient descent implicitly controls network capacity.
An empirical method using unlabeled data accurately estimates generalization.
Abstract
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms
