Contrastive Multiview Coding
Yonglong Tian, Dilip Krishnan, Phillip Isola

TL;DR
This paper introduces a multiview contrastive learning method that learns view-invariant representations by maximizing mutual information across multiple sensory views, achieving state-of-the-art results in unsupervised image and video learning.
Contribution
It presents a scalable, view-agnostic contrastive learning framework that outperforms cross-view prediction methods and benefits from multiple views for better semantic representation.
Findings
Contrastive loss outperforms cross-view prediction.
More views lead to better semantic representations.
Achieves state-of-the-art results on image and video benchmarks.
Abstract
Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We investigate the classic hypothesis that a powerful representation is one that models view-invariant factors. We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
MethodsInfoNCE · Random Horizontal Flip · Random Resized Crop · Adam · SGD with Momentum · Weight Decay · Contrastive Multiview Coding · Grouped Convolution · Dropout · Local Response Normalization
