Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P. Adams and, Peter Orbanz

TL;DR
This paper establishes non-vacuous generalization bounds for large neural networks on ImageNet by linking model compression to generalization, providing the first such guarantees at this scale.
Contribution
It introduces a PAC-Bayesian compression-based generalization bound applicable to realistic architectures on ImageNet, connecting compression limits with overfitting.
Findings
State-of-the-art non-vacuous generalization guarantees for ImageNet models
Overfitting correlates with increased model description length
Compression limits relate to expected generalization error
Abstract
Modern neural networks are highly overparameterized, with capacity to substantially overfit to training data. Nevertheless, these networks often generalize well in practice. It has also been observed that trained networks can often be "compressed" to much smaller representations. The purpose of this paper is to connect these two empirical observations. Our main technical result is a generalization bound for compressed networks based on the compressed size. Combined with off-the-shelf compression algorithms, the bound leads to state of the art generalization guarantees; in particular, we provide the first non-vacuous generalization guarantees for realistic architectures applied to the ImageNet classification problem. As additional evidence connecting compression and generalization, we show that compressibility of models that tend to overfit is limited: We establish an absolute limit on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Advanced Neural Network Applications
