On Mutual Information Maximization for Representation Learning

Michael Tschannen; Josip Djolonga; Paul K. Rubenstein; Sylvain Gelly,; Mario Lucic

arXiv:1907.13625·cs.LG·January 24, 2020·219 cites

On Mutual Information Maximization for Representation Learning

Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly,, Mario Lucic

PDF

Open Access 2 Repos

TL;DR

This paper critically examines mutual information maximization in unsupervised representation learning, highlighting its limitations and emphasizing the importance of inductive biases and architecture choices in the success of these methods.

Contribution

It provides empirical evidence that the effectiveness of MI-based methods depends heavily on architectural biases and parameterizations, not solely on mutual information.

Findings

01

MI estimation is challenging and can lead to entangled representations.

02

Success of MI-based methods relies on inductive biases in architecture and estimator design.

03

Connection established between MI maximization and deep metric learning.

Abstract

Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Image and Video Retrieval Techniques