Controllable Generation with Text-to-Image Diffusion Models: A Survey
Pu Cao, Feng Zhou, Qing Song, Lu Yang

TL;DR
This survey reviews recent advances in controllable text-to-image diffusion models, analyzing theoretical mechanisms and practical methods for conditionally guiding image generation beyond simple text prompts.
Contribution
It provides a comprehensive overview of controlling mechanisms in T2I diffusion models, categorizing approaches based on different conditioning strategies and offering insights into theoretical foundations and practical implementations.
Findings
Revealed how conditions are integrated into diffusion models.
Categorized controllable generation methods into specific, multiple, and universal conditions.
Provided a curated repository of related literature.
Abstract
In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generative functions. However, relying solely on text for conditioning these models does not fully cater to the varied and complex requirements of different applications and scenarios. Acknowledging this shortfall, a variety of studies aim to control pre-trained text-to-image (T2I) models to support novel conditions. In this survey, we undertake a thorough review of the literature on controllable generation with T2I diffusion models, covering both the theoretical foundations and practical advancements in this domain. Our review begins with a brief introduction to the basics of denoising diffusion probabilistic models (DDPMs) and widely used T2I diffusion models. We then reveal the controlling mechanisms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Video Analysis and Summarization
MethodsDiffusion
