Explaining generalization in deep learning: progress and fundamental   limits

Vaishnavh Nagarajan

arXiv:2110.08922·cs.LG·October 19, 2021·1 cites

Explaining generalization in deep learning: progress and fundamental limits

Vaishnavh Nagarajan

PDF

Open Access

TL;DR

This paper investigates why deep networks generalize well despite overparameterization, analyzing uniform convergence limits, and proposing an empirical method using unlabeled data to estimate generalization.

Contribution

It provides empirical insights into implicit capacity control, derives improved data-dependent bounds, and introduces a novel empirical technique for estimating generalization without relying solely on uniform convergence.

Findings

01

Uniform convergence bounds can be vacuous in overparameterized settings.

02

Training via stochastic gradient descent implicitly controls network capacity.

03

An empirical method using unlabeled data accurately estimates generalization.

Abstract

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms