Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases
Senthil Purushwalkam, Abhinav Gupta

TL;DR
This paper analyzes contrastive self-supervised learning methods, revealing their strengths and limitations in invariance learning, and proposes leveraging unstructured videos to improve viewpoint invariance and downstream task performance.
Contribution
It provides a detailed analysis of invariances learned by contrastive methods and introduces a novel approach using unstructured videos to enhance viewpoint invariance.
Findings
MOCO and PIRL learn occlusion-invariant representations
They fail to capture viewpoint and category invariance
Using unstructured videos improves invariance and downstream performance
Abstract
Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare
MethodsJigsaw · PIRL · Batch Normalization · InfoNCE · Momentum Contrast
