Grouped Discrete Representation Guides Object-Centric Learning
Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen

TL;DR
This paper introduces Grouped Discrete Representation (GDR), a novel approach that groups features into attributes and indexes them with tuples, enhancing object-centric learning's convergence and generalization capabilities.
Contribution
The paper proposes GDR, which improves object-centric learning by grouping features into attributes and indexing them with tuples, addressing limitations of minimal units and attribute-level commonalities.
Findings
GDR consistently improves convergence across various models and datasets.
GDR enhances generalization in object-centric learning tasks.
Visualizations confirm GDR captures attribute-level information effectively.
Abstract
Similar to humans perceiving visual scenes as objects, Object-Centric Learning (OCL) can abstract dense images or videos into sparse object-level features. Transformer-based OCL handles complex textures well due to the decoding guidance of discrete representation, obtained by discretizing noisy features in image or video feature maps using template features from a codebook. However, treating features as minimal units overlooks their composing attributes, thus impeding model generalization; indexing features with natural numbers loses attribute-level commonalities and characteristics, thus diminishing heuristics for model convergence. We propose \textit{Grouped Discrete Representation} (GDR) to address these issues by grouping features into attributes and indexing them with tuple numbers. In extensive experiments across different query initializations, dataset modalities, and model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Image and Video Retrieval Techniques
