Information Complexity and Generalization Bounds

Pradeep Kr. Banerjee; Guido Mont\'ufar

arXiv:2105.01747·cs.LG·October 26, 2021

Information Complexity and Generalization Bounds

Pradeep Kr. Banerjee, Guido Mont\'ufar

PDF

TL;DR

This paper unifies PAC-Bayesian and mutual information bounds on generalization error, introduces new bounds for complex scenarios, and discusses practical algorithms like Entropy-SGD and PAC-Bayes-SGD for neural networks.

Contribution

It provides a unifying framework using Tong Zhang's IEI to derive various generalization bounds, including new bounds for data-dependent priors and unbounded losses.

Findings

01

Several existing bounds are derived as corollaries of IEI.

02

New bounds are established for data-dependent priors and unbounded loss functions.

03

Practical variants of Gibbs algorithms are discussed for neural network training.

Abstract

We present a unifying picture of PAC-Bayesian and mutual information-based upper bounds on the generalization error of randomized learning algorithms. As we show, Tong Zhang's information exponential inequality (IEI) gives a general recipe for constructing bounds of both flavors. We show that several important results in the literature can be obtained as simple corollaries of the IEI under different assumptions on the loss function. Moreover, we obtain new bounds for data-dependent priors and unbounded loss functions. Optimizing the bounds gives rise to variants of the Gibbs algorithm, for which we discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD. Further, we use an Occam's factor argument to show a PAC-Bayesian bound that incorporates second-order curvature information of the training loss.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent