The Role of Mutual Information in Variational Classifiers
Matias Vera, Leonardo Rey Vega, Pablo Piantanida

TL;DR
This paper establishes an information-theoretic framework showing that the generalization error of variational classifiers can be bounded by mutual information between inputs and latent representations, supported by theoretical analysis and experiments.
Contribution
It derives bounds linking generalization error to mutual information in variational classifiers, providing theoretical insight into the regularization effect of the KL term.
Findings
Mutual information bounds effectively predict generalization error.
The KL divergence acts as a regularizer controlling mutual information.
Numerical experiments confirm mutual information's role in generalization on MNIST and CIFAR.
Abstract
Overfitting data is a well-known phenomenon related with the generation of a model that mimics too closely (or exactly) a particular instance of data, and may therefore fail to predict future observations reliably. In practice, this behaviour is controlled by various--sometimes heuristics--regularization techniques, which are motivated by developing upper bounds to the generalization error. In this work, we study the generalization error of classifiers relying on stochastic encodings trained on the cross-entropy loss, which is often used in deep learning for classification problems. We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information between input features and the corresponding representations in the latent space, which are randomly generated according to the encoding distribution. Our bounds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
MethodsDropout
