ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schr\"odinger Bridge
Eslam Abdelrahman, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord,, Patrick Perez, Mohamed Elhoseiny

TL;DR
ToddlerDiffusion introduces a modality-space diffusion framework that cascades interpretable stages for high-quality image generation, leveraging Schr"odinger Bridge for optimal transport, resulting in improved efficiency and interpretability over existing methods.
Contribution
The paper presents a novel cascaded diffusion approach in modality space using Schr"odinger Bridge, enhancing interpretability, efficiency, and performance in image generation.
Findings
Outperforms state-of-the-art methods on multiple datasets.
Operates twice as fast as LDM with smaller architecture.
Achieves high-quality image generation with interpretable stages.
Abstract
Diffusion models break down the challenging task of generating data from high-dimensional distributions into a series of easier denoising steps. Inspired by this paradigm, we propose a novel approach that extends the diffusion framework into modality space, decomposing the complex task of RGB image generation into simpler, interpretable stages. Our method, termed ToddlerDiffusion, cascades modality-specific models, each responsible for generating an intermediate representation, such as contours, palettes, and detailed textures, ultimately culminating in a high-quality RGB image. Instead of relying on the naive LDM concatenation conditioning mechanism to connect the different stages together, we employ Schr\"odinger Bridge to determine the optimal transport between different modalities. Although employing a cascaded pipeline introduces more stages, which could lead to a more complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Computer Graphics and Visualization Techniques
MethodsDiffusion
