Understanding Generalization via Leave-One-Out Conditional Mutual Information
Mahdi Haghifam, Shay Moran, Daniel M. Roy, Gintare Karolina Dziugaite

TL;DR
This paper investigates the role of leave-one-out conditional mutual information in understanding the generalization ability of learning algorithms, providing bounds, connections to classical error estimates, and applications to VC classes.
Contribution
It introduces leave-one-out CMI as a tool for analyzing generalization, establishes bounds on risk, and applies the framework to VC classes, answering open questions.
Findings
Leave-one-out CMI controls mean generalization error.
Bounds on risk are within a factor of two for certain decay rates.
Match the optimal bound for VC class learning in the realizable setting.
Abstract
We study the mutual information between (certain summaries of) the output of a learning algorithm and its training data, conditional on a supersample of i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean generalization error of learning algorithms with bounded loss functions. For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk. Using this connection, we obtain upper and lower bounds on risk in terms of the (evaluated) leave-one-out CMI. When the limiting risk is constant or decays polynomially, the bounds converge to within a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
