GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

TL;DR
GLASS introduces a novel guided diffusion approach for object-centric learning, significantly improving slot representations and enabling realistic scene generation in complex real-world images.
Contribution
It proposes Guided Latent Slot Diffusion (GLASS), a new model that learns object slots in generated image space with semantic guidance, outperforming existing methods on real-world datasets.
Findings
Outperforms state-of-the-art slot-attention methods on object discovery
Enables realistic scene generation with complex textures and shapes
First application of slot attention to compositional scene generation
Abstract
Object-centric learning aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable a variety of downstream tasks. Yet, object-centric learning struggles on real-world datasets, which contain multiple objects of complex textures and shapes in natural everyday scenes. To address this, we introduce Guided Latent Slot Diffusion (GLASS), a novel slot-attention model that learns in the space of generated images and uses semantic and instance guidance modules to learn better slot embeddings for various downstream tasks. Our experiments show that GLASS surpasses state-of-the-art slot-attention methods by a wide margin on tasks such as (zero-shot) object discovery and conditional image generation for real-world scenes. Moreover, GLASS enables the first application of slot attention to the compositional generation of complex,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Diffusion · ALIGN
