Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu,, Francesco Locatello, Zheng Zhang

TL;DR
This paper introduces AdaSlot, a novel object-centric learning framework that dynamically determines the number of object slots based on data content, improving flexibility and performance over fixed-slot models.
Contribution
We propose an adaptive slot attention mechanism with a discrete slot sampling module and masked slot decoder, enabling dynamic slot number determination in object discovery tasks.
Findings
Achieves performance comparable or superior to fixed-slot models.
Demonstrates ability to adapt slot number to data complexity.
Validates effectiveness across various datasets.
Abstract
Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot attention, is their reliance on predefining the number of slots. This not only necessitates prior knowledge of the dataset but also overlooks the inherent variability in the number of objects present in each instance. To overcome this fundamental limitation, we present a novel complexity-aware object auto-encoder framework. Within this framework, we introduce an adaptive slot attention (AdaSlot) mechanism that dynamically determines the optimal number of slots based on the content of the data.…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
1. To design a module to adaptively select the number of slots used for slot attention is an interesting and exciting research direction. 2. The complexity-aware object-encoder framework is designed simple and clear. It makes the readers easily understand the function of every single component. 3. This paper provides a variety of experimental studies to better demonstrate the properties of the proposed method.
1. This paper is a bit difficult to follow. For example, this paper proposes an object-centric learning method, but the title is about one of its applications, object discovery. Meanwhile, there are no experiments about object discovery to further support the title. It makes the readers confused about the target of this paper. In addition, the writing of the method part could be better improved. For example, it is difficult to follow the authors to understand what are the structures of the two p
- The proposed method is well-motivated. - The proposed method is shown to be effective in dynamically the number of slots within the attention mechanism. - The proposed method is simple and easy to implement.
- The proposed method has little novelty since it is a straightforward combination of existing methods: slot attention (Locatello et al., 2020), clustering number selection (Blei & Jordan, 2006), and Gumbel-Softmax (Jang et al., 2016). The authors are encouraged to offer more insights into how these modules interact with each other and the specific roles they play individually. This added detail would contribute to a deeper understanding of the method's functionality and its overall design ratio
- Reasonable approach to the problem while still being computationally efficient and differentiable - Considers a problem that seems useful, improving the determination of the "correct" number of objects - Extensive ablations and analysis to understand the method are impressive
- The premise of the paper hinges on the assumption that it is desirable to have the slots correspond exactly to objects. While this appears reasonable in simpler datasets, the notion of what an object is becomes less clear for more realistic datasets (such as COCO, which only has certain types of objects labeled at a very particular granularity level). In the end, the goal of unsupervised object discovery is to do something useful with the objects. Therefore, I believe that when arguing whether
- The paper is well-written and easy to follow - The number of slots as a hyper-parameter is indeed a long-standing problem in the field. This paper proposes a valid solution to it - The experiments show promising segmentation results on both synthetic and real-world datasets
I have two main concerns regarding the paper: - The experiments in the paper mainly focus on object segmentation. While it is an important outcome of OCL, the quality of learned object slots is another important aspect. The authors conduct object property prediction experiments on CLEVR10. However, CLEVR10 is too simple. I would suggest the authors to further experiment on MOVi, e.g., follow the protocol in LSD [1] - The goal of this paper is to get rid of a pre-defined number of slots. However,
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Machine Learning and Data Classification
