Object-Centric Slot Diffusion
Jindong Jiang, Fei Deng, Gautam Singh, Sungjin Ahn

TL;DR
This paper introduces Latent Slot Diffusion (LSD), a novel object-centric learning model that integrates diffusion models to improve scene understanding and generation without supervision, outperforming existing methods especially in complex scenes.
Contribution
LSD is the first object-centric model to replace slot decoders with a latent diffusion model conditioned on object slots, enabling unsupervised compositional generation without annotations.
Findings
LSD outperforms state-of-the-art transformer decoders in complex scenes.
LSD demonstrates superior unsupervised compositional generation quality.
Pre-trained diffusion models enhance LSD's real-world segmentation and generation capabilities.
Abstract
The recent success of transformer-based image generative models in object-centric learning highlights the importance of powerful image generators for handling complex scenes. However, despite the high expressiveness of diffusion models in image generation, their integration into object-centric learning remains largely unexplored in this domain. In this paper, we explore the feasibility and potential of integrating diffusion models into object-centric learning and investigate the pros and cons of this approach. We introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes: it is the first object-centric learning model to replace conventional slot decoders with a latent diffusion model conditioned on object slots, and it is also the first unsupervised compositional conditional diffusion model that operates without the need for supervised annotations like text. Through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsLatent Diffusion Model · Diffusion
