Understanding Generalization via Leave-One-Out Conditional Mutual   Information

Mahdi Haghifam; Shay Moran; Daniel M. Roy; Gintare Karolina Dziugaite

arXiv:2206.14800·cs.LG·June 30, 2022

Understanding Generalization via Leave-One-Out Conditional Mutual Information

Mahdi Haghifam, Shay Moran, Daniel M. Roy, Gintare Karolina Dziugaite

PDF

Open Access

TL;DR

This paper investigates the role of leave-one-out conditional mutual information in understanding the generalization ability of learning algorithms, providing bounds, connections to classical error estimates, and applications to VC classes.

Contribution

It introduces leave-one-out CMI as a tool for analyzing generalization, establishes bounds on risk, and applies the framework to VC classes, answering open questions.

Findings

01

Leave-one-out CMI controls mean generalization error.

02

Bounds on risk are within a factor of two for certain decay rates.

03

Match the optimal bound for VC class learning in the realizable setting.

Abstract

We study the mutual information between (certain summaries of) the output of a learning algorithm and its $n$ training data, conditional on a supersample of $n + 1$ i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean generalization error of learning algorithms with bounded loss functions. For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk. Using this connection, we obtain upper and lower bounds on risk in terms of the (evaluated) leave-one-out CMI. When the limiting risk is constant or decays polynomially, the bounds converge to within a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and ELM · Stochastic Gradient Optimization Techniques