Masked Multi-Query Slot Attention for Unsupervised Object Discovery

Rishav Pramanik; Jos\'e-Fabian Villa-V\'asquez; Marco Pedersoli

arXiv:2404.19654·cs.CV·November 7, 2024

Masked Multi-Query Slot Attention for Unsupervised Object Discovery

Rishav Pramanik, Jos\'e-Fabian Villa-V\'asquez, Marco Pedersoli

PDF

Open Access 1 Repo

TL;DR

This paper introduces a masked multi-query slot attention method for unsupervised object discovery that improves object localization by focusing on salient regions and learning multiple slot sets, tested on PASCAL-VOC 2012.

Contribution

It proposes a novel masking scheme and multi-query slot attention extension that enhance unsupervised object discovery performance.

Findings

01

Improved object localization accuracy on PASCAL-VOC 2012.

02

Masking background regions enhances focus on salient objects.

03

Multi-query approach yields more stable and accurate masks.

Abstract

Unsupervised object discovery is becoming an essential line of research for tackling recognition problems that require decomposing an image into entities, such as semantic segmentation and object detection. Recently, object-centric methods that leverage self-supervision have gained popularity, due to their simplicity and adaptability to different settings and conditions. However, those methods do not exploit effective techniques already employed in modern self-supervised approaches. In this work, we consider an object-centric approach in which DINO ViT features are reconstructed via a set of queried representations called slots. Based on that, we propose a masking scheme on input features that selectively disregards the background regions, inducing our model to focus more on salient objects during the reconstruction phase. Moreover, we extend the slot attention to a multi-query…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rishavpramanik/maskedmultiqueryslot
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Web Data Mining and Analysis · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Residual Connection · Softmax · Vision Transformer · self-DIstillation with NO labels