Slot-Guided Adaptation of Pre-trained Diffusion Models for   Object-Centric Learning and Compositional Generation

Adil Kaan Akan; Yucel Yemez

arXiv:2501.15878·cs.CV·March 4, 2025

Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

Adil Kaan Akan, Yucel Yemez

PDF

Open Access 1 Video

TL;DR

SlotAdapt is a novel object-centric learning method that integrates slot attention with pretrained diffusion models using adapters, improving object discovery and compositional image generation without external supervision.

Contribution

It introduces adapters for slot-based conditioning in diffusion models, aligning cross-attention with slot attention, and enhances object-centric generative capabilities.

Findings

01

Outperforms state-of-the-art in object discovery and image generation

02

Effective on real-world complex images for compositional generation

03

Maintains generative power while reducing text-centric bias

Abstract

We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. Our method preserves the generative power of pretrained diffusion models, while avoiding their text-centric conditioning bias. We also incorporate an additional guidance loss into our architecture to align cross-attention from adapter layers with slot attention. This enhances the alignment of our model with the objects in the input image without using external supervision. Experimental results show that our method outperforms state-of-the-art techniques in object discovery and image generation tasks across multiple datasets, including those with real images. Furthermore, we demonstrate through experiments that our method performs remarkably well on complex real-world images for compositional generation, in contrast to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation· slideslive

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning

MethodsSoftmax · Attention Is All You Need · Diffusion · ALIGN · Adapter