Transitive Invariance for Self-supervised Visual Representation Learning

Xiaolong Wang; Kaiming He; Abhinav Gupta

arXiv:1708.02901·cs.CV·August 16, 2017·22 cites

Transitive Invariance for Self-supervised Visual Representation Learning

Xiaolong Wang, Kaiming He, Abhinav Gupta

PDF

Open Access

TL;DR

This paper introduces a novel self-supervised learning approach that leverages a graph of objects and their invariances to learn more robust visual representations, improving performance on recognition tasks without relying on labeled data.

Contribution

It proposes organizing data with multiple invariances into a graph and applying transitivity to enhance self-supervised visual representation learning.

Findings

01

Achieves 63.2% mAP on PASCAL VOC 2007 with Fast R-CNN

02

Close to supervised performance on COCO dataset with 23.5% mAP

03

Outperforms ImageNet pre-trained network in surface normal estimation

Abstract

Learning visual representations with self-supervised learning has become popular in computer vision. The idea is to design auxiliary tasks where labels are free to obtain. Most of these tasks end up providing data to learn specific kinds of invariance useful for recognition. In this paper, we propose to exploit different self-supervised approaches to learn representations invariant to (i) inter-instance variations (two objects in the same class should have similar features) and (ii) intra-instance variations (viewpoint, pose, deformations, illumination, etc). Instead of combining two approaches with multi-task learning, we argue to organize and reason the data with multiple variations. Specifically, we propose to generate a graph with millions of objects mined from hundreds of thousands of videos. The objects are connected by two types of edges which correspond to two types of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsRegion Proposal Network · Softmax · Convolution · RoIPool · Faster R-CNN