Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
Adil Kaan Akan, Yucel Yemez

TL;DR
SlotAdapt is a novel object-centric learning method that integrates slot attention with pretrained diffusion models using adapters, improving object discovery and compositional image generation without external supervision.
Contribution
It introduces adapters for slot-based conditioning in diffusion models, aligning cross-attention with slot attention, and enhances object-centric generative capabilities.
Findings
Outperforms state-of-the-art in object discovery and image generation
Effective on real-world complex images for compositional generation
Maintains generative power while reducing text-centric bias
Abstract
We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. Our method preserves the generative power of pretrained diffusion models, while avoiding their text-centric conditioning bias. We also incorporate an additional guidance loss into our architecture to align cross-attention from adapter layers with slot attention. This enhances the alignment of our model with the objects in the input image without using external supervision. Experimental results show that our method outperforms state-of-the-art techniques in object discovery and image generation tasks across multiple datasets, including those with real images. Furthermore, we demonstrate through experiments that our method performs remarkably well on complex real-world images for compositional generation, in contrast to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning
MethodsSoftmax · Attention Is All You Need · Diffusion · ALIGN · Adapter
