Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural   Networks with Many More Parameters than Training Data

Gintare Karolina Dziugaite; Daniel M. Roy

arXiv:1703.11008·cs.LG·October 20, 2017·250 cites

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

Gintare Karolina Dziugaite, Daniel M. Roy

PDF

Open Access 3 Repos

TL;DR

This paper develops a method to compute meaningful, nonvacuous generalization bounds for deep stochastic neural networks with many parameters, explaining why they generalize well despite overparameterization.

Contribution

It extends PAC-Bayes bounds to deep neural networks, providing the first nonvacuous bounds for models with millions of parameters trained on limited data.

Findings

01

Achieves nonvacuous generalization bounds for deep stochastic neural networks.

02

Connects bounds to flat minima and MDL explanations.

03

Demonstrates bounds for networks trained on tens of thousands of examples.

Abstract

One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data. In light of this capacity for overfitting, it is remarkable that simple algorithms like SGD reliably return solutions with low test error. One roadblock to explaining these phenomena in terms of implicit regularization, structural properties of the solution, and/or easiness of the data is that many learning bounds are quantitatively vacuous when applied to networks learned by SGD in this "deep learning" regime. Logically, in order to explain generalization, we need nonvacuous bounds. We return to an idea by Langford and Caruana (2001), who used PAC-Bayes bounds to compute nonvacuous numerical bounds on generalization error for stochastic two-layer two-hidden-unit neural networks via a sensitivity analysis. By optimizing the PAC-Bayes bound directly, we are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsStochastic Gradient Descent