Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick

TL;DR
MoCo introduces a dynamic dictionary approach for unsupervised visual representation learning, achieving competitive and transfer-friendly features that often outperform supervised pre-training on various vision tasks.
Contribution
The paper proposes Momentum Contrast (MoCo), a novel method that builds a large, consistent dictionary for contrastive learning using a queue and a moving-averaged encoder.
Findings
MoCo achieves competitive ImageNet classification results.
MoCo representations transfer well to downstream tasks.
MoCo can outperform supervised pre-training in detection and segmentation tasks.
Abstract
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Momentum Contrast for Unsupervised Visual Representation Learning· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsInfoNCE · Random Grayscale · Random Horizontal Flip · Color Jitter · Random Resized Crop · Feature Pyramid Network · RoIAlign · Mask R-CNN · Region Proposal Network · Softmax
