Contrastive Multiview Coding

Yonglong Tian; Dilip Krishnan; Phillip Isola

arXiv:1906.05849·cs.CV·December 21, 2020·571 cites

Contrastive Multiview Coding

Yonglong Tian, Dilip Krishnan, Phillip Isola

PDF

Open Access 5 Repos 2 Datasets

TL;DR

This paper introduces a multiview contrastive learning method that learns view-invariant representations by maximizing mutual information across multiple sensory views, achieving state-of-the-art results in unsupervised image and video learning.

Contribution

It presents a scalable, view-agnostic contrastive learning framework that outperforms cross-view prediction methods and benefits from multiple views for better semantic representation.

Findings

01

Contrastive loss outperforms cross-view prediction.

02

More views lead to better semantic representations.

03

Achieves state-of-the-art results on image and video benchmarks.

Abstract

Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We investigate the classic hypothesis that a powerful representation is one that models view-invariant factors. We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Video Surveillance and Tracking Methods · Advanced Vision and Imaging

MethodsInfoNCE · Random Horizontal Flip · Random Resized Crop · Adam · SGD with Momentum · Weight Decay · Contrastive Multiview Coding · Grouped Convolution · Dropout · Local Response Normalization