A New Family of Generalization Bounds Using Samplewise Evaluated CMI
Fredrik Hellstr\"om, Giuseppe Durisi

TL;DR
This paper introduces a new family of information-theoretic generalization bounds based on samplewise evaluated conditional mutual information, providing tighter and more versatile bounds for neural networks and classification tasks.
Contribution
It develops a novel framework using evaluated CMI that unifies and extends existing bounds, including a samplewise PAC-Bayesian bound with potential for tighter neural network generalization estimates.
Findings
Derived a new samplewise, average PAC-Bayesian bound using evaluated CMI.
Achieved tighter bounds for deep neural networks in certain scenarios.
Unified various bounds for multiclass classification with finite Natarajan dimension.
Abstract
We present a new family of information-theoretic generalization bounds, in which the training loss and the population loss are compared through a jointly convex function. This function is upper-bounded in terms of the disintegrated, samplewise, evaluated conditional mutual information (CMI), an information measure that depends on the losses incurred by the selected hypothesis, rather than on the hypothesis itself, as is common in probably approximately correct (PAC)-Bayesian results. We demonstrate the generality of this framework by recovering and extending previously known information-theoretic bounds. Furthermore, using the evaluated CMI, we derive a samplewise, average version of Seeger's PAC-Bayesian bound, where the convex function is the binary KL divergence. In some scenarios, this novel bound results in a tighter characterization of the population loss of deep neural networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Bayesian Modeling and Causal Inference · Adversarial Robustness in Machine Learning
