A Classification approach towards Unsupervised Learning of Visual Representations
Aditya Vora

TL;DR
This paper introduces an unsupervised learning method for visual representations using a classification task on foreground and background patches mined from unlabeled videos, achieving competitive object recognition performance.
Contribution
It proposes a novel unsupervised approach that leverages patch mining from unlabeled videos for training a classification model without supervision.
Findings
Achieves 45.3 mAP on PASCAL VOC 2007
Uses only 150,000 unlabeled videos for training
Performs close to state-of-the-art unsupervised methods
Abstract
In this paper, we present a technique for unsupervised learning of visual representations. Specifically, we train a model for foreground and background classification task, in the process of which it learns visual representations. Foreground and background patches for training come af- ter mining for such patches from hundreds and thousands of unlabelled videos available on the web which we ex- tract using a proposed patch extraction algorithm. With- out using any supervision, with just using 150, 000 unla- belled videos and the PASCAL VOC 2007 dataset, we train a object recognition model that achieves 45.3 mAP which is close to the best performing unsupervised feature learn- ing technique whereas better than many other proposed al- gorithms. The code for patch extraction is implemented in Matlab and available open source at the following link .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
