Generalization Bounds via Convex Analysis

G\'abor Lugosi; Gergely Neu

arXiv:2202.04985·stat.ML·July 20, 2022·6 cites

Generalization Bounds via Convex Analysis

G\'abor Lugosi, Gergely Neu

PDF

Open Access

TL;DR

This paper extends generalization bounds in supervised learning by replacing mutual information with strongly convex dependence measures, enabling bounds for heavy-tailed and smooth loss functions using convex analysis.

Contribution

It introduces a framework to replace mutual information with any strongly convex dependence measure for deriving generalization bounds, broadening applicability.

Findings

01

Bounds in terms of p-norm divergences and Wasserstein-2 distance.

02

Applicable to heavy-tailed loss distributions.

03

Applicable to highly smooth loss functions.

Abstract

Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail. In this work, we generalize this result beyond the standard choice of Shannon's mutual information to measure the dependence between the input and the output. Our main result shows that it is indeed possible to replace the mutual information by any strongly convex function of the joint input-output distribution, with the subgaussianity condition on the losses replaced by a bound on an appropriately chosen norm capturing the geometry of the dependence measure. This allows us to derive a range of generalization bounds that are either entirely new or strengthen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning · Statistical Methods and Inference