ToddlerDiffusion: Interactive Structured Image Generation with Cascaded   Schr\"odinger Bridge

Eslam Abdelrahman; Liangbing Zhao; Vincent Tao Hu; Matthieu Cord,; Patrick Perez; Mohamed Elhoseiny

arXiv:2311.14542·cs.CV·October 8, 2024·1 cites

ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schr\"odinger Bridge

Eslam Abdelrahman, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord,, Patrick Perez, Mohamed Elhoseiny

PDF

Open Access

TL;DR

ToddlerDiffusion introduces a modality-space diffusion framework that cascades interpretable stages for high-quality image generation, leveraging Schr"odinger Bridge for optimal transport, resulting in improved efficiency and interpretability over existing methods.

Contribution

The paper presents a novel cascaded diffusion approach in modality space using Schr"odinger Bridge, enhancing interpretability, efficiency, and performance in image generation.

Findings

01

Outperforms state-of-the-art methods on multiple datasets.

02

Operates twice as fast as LDM with smaller architecture.

03

Achieves high-quality image generation with interpretable stages.

Abstract

Diffusion models break down the challenging task of generating data from high-dimensional distributions into a series of easier denoising steps. Inspired by this paradigm, we propose a novel approach that extends the diffusion framework into modality space, decomposing the complex task of RGB image generation into simpler, interpretable stages. Our method, termed ToddlerDiffusion, cascades modality-specific models, each responsible for generating an intermediate representation, such as contours, palettes, and detailed textures, ultimately culminating in a high-quality RGB image. Instead of relying on the naive LDM concatenation conditioning mechanism to connect the different stages together, we employ Schr\"odinger Bridge to determine the optimal transport between different modalities. Although employing a cascaded pipeline introduces more stages, which could lead to a more complex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Computer Graphics and Visualization Techniques

MethodsDiffusion