Fast Rate Generalization Error Bounds: Variations on a Theme
Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu

TL;DR
This paper explores conditions under which fast O(1/n) generalization error bounds can be achieved using information-theoretic measures, challenging the common belief that such bounds are slow due to square root dependencies.
Contribution
It introduces the (eta,c)-central condition that enables fast rate bounds and demonstrates how information-theoretic bounds can be applied under this condition for specific algorithms.
Findings
Fast rate (O(1/n)) bounds are possible under certain assumptions.
The (eta,c)-central condition is key for achieving fast rates.
Analytical examples validate the effectiveness of the proposed bounds.
Abstract
A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(sqrt{lambda/n}) where lambda is some information-theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate" of O(1/n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate (O(1/n)) result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition. Under this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
