Fast Rate Information-theoretic Bounds on Generalization Errors

Xuetong Wu; Jonathan H. Manton; Uwe Aickelin; Jingge Zhu

arXiv:2303.14658·cs.IT·June 24, 2025·1 cites

Fast Rate Information-theoretic Bounds on Generalization Errors

Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu

PDF

Open Access

TL;DR

This paper improves information-theoretic bounds on the generalization error by establishing conditions under which these bounds are asymptotically tight and introducing new bounds based on the $(\,eta, c)$-central condition.

Contribution

It demonstrates that fast convergence rates can be achieved under certain assumptions and introduces new bounds based on the $(\,eta, c)$-central condition that directly relate mutual information to convergence rates.

Findings

01

Bounds can be asymptotically tight with appropriate assumptions.

02

The $(\eta, c)$-central condition simplifies verification of bounds.

03

Numerical examples confirm the effectiveness of the new bounds.

Abstract

The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by Bu et al., which itself is a tightened version of the first bound on the topic by Russo et al. and Xu et al., this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size $n$ . It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM