Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey
Oriane Sim\'eoni, \'Eloi Zablocki, Spyros Gidaris, Gilles Puy, and Patrick P\'erez

TL;DR
This survey reviews recent methods for unsupervised object localization using self-supervised Vision Transformers, highlighting approaches that discover objects without manual annotations in open-world vision systems.
Contribution
It provides a comprehensive overview of class-agnostic unsupervised object localization techniques leveraging self-supervised ViTs, including a curated repository of methods.
Findings
Recent methods successfully localize objects without annotations
Self-supervised ViTs enable class-agnostic object discovery
The survey consolidates current approaches and resources
Abstract
The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about them? Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs. We gather links of discussed methods in the repository https://github.com/valeoai/Awesome-Unsupervised-Object-Localization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
