Revisiting Self-Supervised Visual Representation Learning
Alexander Kolesnikov, Xiaohua Zhai, Lucas Beyer

TL;DR
This paper conducts a comprehensive large-scale study of self-supervised visual representation learning, challenging common practices, and significantly improves existing methods to outperform state-of-the-art results.
Contribution
It provides new insights into CNN design choices for self-supervised learning and enhances existing techniques to achieve superior performance.
Findings
Standard CNN design recipes do not always benefit self-supervised learning.
Revisiting and refining existing models leads to significant performance improvements.
The study outperforms previous state-of-the-art results by a large margin.
Abstract
Unsupervised visual representation learning remains a largely unsolved problem in computer vision research. Among a big body of recently proposed approaches for unsupervised learning of visual representations, a class of self-supervised techniques achieves superior performance on many challenging benchmarks. A large number of the pretext tasks for self-supervised learning have been studied, but other important aspects, such as the choice of convolutional neural networks (CNN), has not received equal attention. Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights. We challenge a number of common practices in selfsupervised visual representation learning and observe that standard recipes for CNN design do not always translate to self-supervised representation learning. As part of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsAverage Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling
