Momentum Contrast for Unsupervised Visual Representation Learning

Kaiming He; Haoqi Fan; Yuxin Wu; Saining Xie; Ross Girshick

arXiv:1911.05722·cs.CV·March 25, 2020·1.0k cites

Momentum Contrast for Unsupervised Visual Representation Learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick

PDF

Open Access 5 Repos 2 Models 1 Video

TL;DR

MoCo introduces a dynamic dictionary approach for unsupervised visual representation learning, achieving competitive and transfer-friendly features that often outperform supervised pre-training on various vision tasks.

Contribution

The paper proposes Momentum Contrast (MoCo), a novel method that builds a large, consistent dictionary for contrastive learning using a queue and a moving-averaged encoder.

Findings

01

MoCo achieves competitive ImageNet classification results.

02

MoCo representations transfer well to downstream tasks.

03

MoCo can outperform supervised pre-training in detection and segmentation tasks.

Abstract

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Momentum Contrast for Unsupervised Visual Representation Learning· youtube

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsInfoNCE · Random Grayscale · Random Horizontal Flip · Color Jitter · Random Resized Crop · Feature Pyramid Network · RoIAlign · Mask R-CNN · Region Proposal Network · Softmax