MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
Hongjia Liu, Rongzhen Zhao, Haohan Chen, Joni Pajarinen

TL;DR
MetaSlot introduces a flexible Slot Attention variant that adapts to varying object counts, improving object representation and interpretability in object-centric learning models.
Contribution
MetaSlot is a novel Slot Attention method that dynamically adjusts to different object numbers using a codebook and noise injection, enhancing performance and interpretability.
Findings
Significant performance improvements across multiple datasets.
More interpretable and stable object representations.
Effective handling of variable object counts.
Abstract
Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods adopt Slot Attention or its variants to iteratively aggregate objects' super-pixels into a fixed set of query feature vectors, termed slots. However, their reliance on a static slot count leads to an object being represented as multiple parts when the number of objects varies. We introduce MetaSlot, a plug-and-play Slot Attention variant that adapts to variable object counts. MetaSlot (i) maintains a codebook that holds prototypes of objects in a dataset by vector-quantizing the resulting slot representations; (ii) removes duplicate slots from the traditionally aggregated slots by quantizing them with the codebook; and (iii) injects progressively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsParallel Computing and Optimization Techniques
