The Role of Mutual Information in Variational Classifiers

Matias Vera; Leonardo Rey Vega; Pablo Piantanida

arXiv:2010.11642·stat.ML·April 14, 2023

The Role of Mutual Information in Variational Classifiers

Matias Vera, Leonardo Rey Vega, Pablo Piantanida

PDF

Open Access

TL;DR

This paper establishes an information-theoretic framework showing that the generalization error of variational classifiers can be bounded by mutual information between inputs and latent representations, supported by theoretical analysis and experiments.

Contribution

It derives bounds linking generalization error to mutual information in variational classifiers, providing theoretical insight into the regularization effect of the KL term.

Findings

01

Mutual information bounds effectively predict generalization error.

02

The KL divergence acts as a regularizer controlling mutual information.

03

Numerical experiments confirm mutual information's role in generalization on MNIST and CIFAR.

Abstract

Overfitting data is a well-known phenomenon related with the generation of a model that mimics too closely (or exactly) a particular instance of data, and may therefore fail to predict future observations reliably. In practice, this behaviour is controlled by various--sometimes heuristics--regularization techniques, which are motivated by developing upper bounds to the generalization error. In this work, we study the generalization error of classifiers relying on stochastic encodings trained on the cross-entropy loss, which is often used in deep learning for classification problems. We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information between input features and the corresponding representations in the latent space, which are randomly generated according to the encoding distribution. Our bounds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsDropout