Constrained Sampling for Class-Agnostic Weakly Supervised Object Localization
Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Aydin Sarraf,, Eric Granger

TL;DR
This paper introduces a novel approach for weakly-supervised object localization that leverages self-supervised vision transformers and a discriminative proposals sampling method to improve object localization accuracy.
Contribution
It proposes a new sampling method using pretrained CNN classifiers to generate pseudo-labels from transformer maps, enhancing class-specific object localization.
Findings
Outperforms state-of-the-art methods on CUB dataset
Provides better foreground object coverage in activation maps
Effective in distinguishing objects of interest from background
Abstract
Self-supervised vision transformers can generate accurate localization maps of the objects in an image. However, since they decompose the scene into multiple maps containing various objects, and they do not rely on any explicit supervisory signal, they cannot distinguish between the object of interest from other objects, as required in weakly-supervised object localization (WSOL). To address this issue, we propose leveraging the multiple maps generated by the different transformer heads to acquire pseudo-labels for training a WSOL model. In particular, a new discriminative proposals sampling method is introduced that relies on a pretrained CNN classifier to identify discriminative regions. Then, foreground and background pixels are sampled from these regions in order to train a WSOL model for generating activation maps that can accurately localize objects belonging to a specific class.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
