Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors
Gintare Karolina Dziugaite, Daniel M. Roy

TL;DR
This paper connects Entropy-SGD with PAC-Bayes bounds, showing how data-dependent priors via SGLD can provide valid generalization guarantees, despite Entropy-SGD's data-dependent nature.
Contribution
It demonstrates that Entropy-SGD optimizes a PAC-Bayes prior, and introduces a method using SGLD to obtain valid data-dependent bounds.
Findings
Entropy-SGD optimizes a PAC-Bayes prior.
Data-dependent priors via SGLD can yield valid bounds.
Test errors on MNIST and CIFAR10 are within risk bounds.
Abstract
We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier. Entropy-SGD works by optimizing the bound's prior, violating the hypothesis of the PAC-Bayes theorem that the prior is chosen independently of the data. Indeed, available implementations of Entropy-SGD rapidly obtain zero training error on random labels and the same holds of the Gibbs posterior. In order to obtain a valid generalization bound, we rely on a result showing that data-dependent priors obtained by stochastic gradient Langevin dynamics (SGLD) yield valid PAC-Bayes bounds provided the target distribution of SGLD is {\epsilon}-differentially private. We observe that test error on MNIST and CIFAR10 falls…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Machine Learning and Data Classification
