Localizing Objects with Self-Supervised Transformers and no Labels

Oriane Sim\'eoni; Gilles Puy; Huy V. Vo; Simon Roburin and; Spyros Gidaris; Andrei Bursuc; Patrick P\'erez; Renaud Marlet and; Jean Ponce

arXiv:2109.14279·cs.CV·October 19, 2021·107 cites

Localizing Objects with Self-Supervised Transformers and no Labels

Oriane Sim\'eoni, Gilles Puy, Huy V. Vo, Simon Roburin and, Spyros Gidaris, Andrei Bursuc, Patrick P\'erez, Renaud Marlet and, Jean Ponce

PDF

Open Access 2 Repos

TL;DR

LOST is a self-supervised transformer-based method for object localization in images that outperforms existing approaches without requiring labels or external proposals, and can improve detection results when used for training.

Contribution

This paper introduces LOST, a novel self-supervised transformer approach for object localization that operates on individual images without external proposals, outperforming state-of-the-art methods.

Findings

01

Outperforms state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012.

02

Training a class-agnostic detector on discovered objects boosts results by 7 points.

03

Shows promising results on unsupervised object discovery tasks.

Abstract

Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image. Yet, we outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012. We also show that training a class-agnostic detector on the discovered objects boosts results by another 7 points. Moreover, we show promising results on the unsupervised object discovery task. The code to reproduce our results can be found at https://github.com/valeoai/LOST.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Dense Connections · Residual Connection · Vision Transformer