Revisiting Self-Supervised Visual Representation Learning

Alexander Kolesnikov; Xiaohua Zhai; Lucas Beyer

arXiv:1901.09005·cs.CV·January 28, 2019·68 cites

Revisiting Self-Supervised Visual Representation Learning

Alexander Kolesnikov, Xiaohua Zhai, Lucas Beyer

PDF

Open Access 5 Repos

TL;DR

This paper conducts a comprehensive large-scale study of self-supervised visual representation learning, challenging common practices, and significantly improves existing methods to outperform state-of-the-art results.

Contribution

It provides new insights into CNN design choices for self-supervised learning and enhances existing techniques to achieve superior performance.

Findings

01

Standard CNN design recipes do not always benefit self-supervised learning.

02

Revisiting and refining existing models leads to significant performance improvements.

03

The study outperforms previous state-of-the-art results by a large margin.

Abstract

Unsupervised visual representation learning remains a largely unsolved problem in computer vision research. Among a big body of recently proposed approaches for unsupervised learning of visual representations, a class of self-supervised techniques achieves superior performance on many challenging benchmarks. A large number of the pretext tasks for self-supervised learning have been studied, but other important aspects, such as the choice of convolutional neural networks (CNN), has not received equal attention. Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights. We challenge a number of common practices in selfsupervised visual representation learning and observe that standard recipes for CNN design do not always translate to self-supervised representation learning. As part of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsAverage Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling