Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization
Stefan Stojanov, Anh Thai, Zixuan Huang, James M. Rehg

TL;DR
This paper introduces DOPE, a self-supervised method for learning dense object representations from multiple views, enabling low-shot category recognition without category labels, and outperforming existing baselines.
Contribution
The work presents a novel self-supervised approach to learn dense discriminative object features from multiple views without category labels, facilitating low-shot recognition.
Findings
DOPE achieves competitive low-shot classification performance.
It outperforms supervised and self-supervised baselines.
The method works with sparse depths, masks, and known camera parameters.
Abstract
A hallmark of the deep learning era for computer vision is the successful use of large-scale labeled datasets to train feature representations for tasks ranging from object recognition and semantic segmentation to optical flow estimation and novel view synthesis of 3D scenes. In this work, we aim to learn dense discriminative object representations for low-shot category recognition without requiring any category labels. To this end, we propose Deep Object Patch Encodings (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels. To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object, and use this to formulate a self-supervised learning task to learn discriminative object patches. We find that DOPE can directly be used for low-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
