On Mutual Information in Contrastive Learning for Visual Representations
Mike Wu, Chengxu Zhuang, Milan Mosse, Daniel Yamins, Noah Goodman

TL;DR
This paper presents a mutual information perspective on contrastive learning for visual representations, showing how negative sample selection impacts performance and providing a unified framework that improves transfer learning tasks.
Contribution
It introduces a generalized mutual information bound for contrastive learning, highlighting the importance of negative sample difficulty and simplifying existing objectives.
Findings
Choosing difficult negative samples enhances representation quality.
The new mutual information-based objectives outperform previous methods.
The framework unifies various contrastive learning approaches.
Abstract
In recent years, several unsupervised, "contrastive" learning algorithms in vision have been shown to learn representations that perform remarkably well on transfer tasks. We show that this family of algorithms maximizes a lower bound on the mutual information between two or more "views" of an image where typical views come from a composition of image augmentations. Our bound generalizes the InfoNCE objective to support negative sampling from a restricted region of "difficult" contrasts. We find that the choice of negative samples and views are critical to the success of these algorithms. Reformulating previous learning objectives in terms of mutual information also simplifies and stabilizes them. In practice, our new objectives yield representations that outperform those learned with previous approaches for transfer to classification, bounding box detection, instance segmentation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Advanced Vision and Imaging
MethodsInfoNCE
