Fast Rate Information-theoretic Bounds on Generalization Errors
Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu

TL;DR
This paper improves information-theoretic bounds on the generalization error by establishing conditions under which these bounds are asymptotically tight and introducing new bounds based on the $(\,eta, c)$-central condition.
Contribution
It demonstrates that fast convergence rates can be achieved under certain assumptions and introduces new bounds based on the $(\,eta, c)$-central condition that directly relate mutual information to convergence rates.
Findings
Bounds can be asymptotically tight with appropriate assumptions.
The $(\eta, c)$-central condition simplifies verification of bounds.
Numerical examples confirm the effectiveness of the new bounds.
Abstract
The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by Bu et al., which itself is a tightened version of the first bound on the topic by Russo et al. and Xu et al., this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size . It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
