Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field   Approximation

Konstantinos Pitas

arXiv:1909.03009·cs.LG·March 6, 2020·1 cites

Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field Approximation

Konstantinos Pitas

PDF

Open Access 1 Video

TL;DR

This paper critically examines the effectiveness of mean-field variational inference in PAC-Bayes bounds for neural networks, finding it offers limited benefits and advocating for richer posterior models.

Contribution

The study empirically demonstrates the limitations of mean-field approximations in PAC-Bayes bounds and suggests exploring more complex posteriors for better generalization guarantees.

Findings

01

Mean-field VI yields negligible improvements in bounds.

02

Optimization issues are not the main cause of poor bounds.

03

Richer posterior models are promising for future research.

Abstract

Explaining how overparametrized neural networks simultaneously achieve low risk and zero empirical risk on benchmark datasets is an open problem. PAC-Bayes bounds optimized using variational inference (VI) have been recently proposed as a promising direction in obtaining non-vacuous bounds. We show empirically that this approach gives negligible gains when modeling the posterior as a Gaussian with diagonal covariance--known as the mean-field approximation. We investigate common explanations, such as the failure of VI due to problems in optimization or choosing a suboptimal prior. Our results suggest that investigating richer posteriors is the most promising direction forward.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field Approximation· slideslive

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Generative Adversarial Networks and Image Synthesis