Uniform convergence may be unable to explain generalization in deep learning
Vaishnavh Nagarajan, J. Zico Kolter

TL;DR
This paper demonstrates that uniform convergence bounds often fail to explain why overparameterized deep networks generalize well, as these bounds can be vacuous or even increase with dataset size, highlighting limitations of current theoretical explanations.
Contribution
The paper provides empirical evidence and theoretical examples showing uniform convergence cannot fully account for generalization in deep learning, challenging existing bounds.
Findings
Uniform convergence bounds can increase with dataset size in practice.
Uniform convergence cannot explain generalization in certain overparameterized models.
Existing bounds often yield vacuous guarantees for models with low test error.
Abstract
Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence. While it is well-known that many of these existing bounds are numerically large, through numerous experiments, we bring to light a more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the training dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by gradient descent (GD) where uniform convergence provably cannot "explain generalization" -- even if we take into account the implicit bias of GD {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by GD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
