Controllable Generation with Text-to-Image Diffusion Models: A Survey

Pu Cao; Feng Zhou; Qing Song; Lu Yang

arXiv:2403.04279·cs.CV·January 9, 2026·5 cites

Controllable Generation with Text-to-Image Diffusion Models: A Survey

Pu Cao, Feng Zhou, Qing Song, Lu Yang

PDF

Open Access 1 Repo

TL;DR

This survey reviews recent advances in controllable text-to-image diffusion models, analyzing theoretical mechanisms and practical methods for conditionally guiding image generation beyond simple text prompts.

Contribution

It provides a comprehensive overview of controlling mechanisms in T2I diffusion models, categorizing approaches based on different conditioning strategies and offering insights into theoretical foundations and practical implementations.

Findings

01

Revealed how conditions are integrated into diffusion models.

02

Categorized controllable generation methods into specific, multiple, and universal conditions.

03

Provided a curated repository of related literature.

Abstract

In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generative functions. However, relying solely on text for conditioning these models does not fully cater to the varied and complex requirements of different applications and scenarios. Acknowledging this shortfall, a variety of studies aim to control pre-trained text-to-image (T2I) models to support novel conditions. In this survey, we undertake a thorough review of the literature on controllable generation with T2I diffusion models, covering both the theoretical foundations and practical advancements in this domain. Our review begins with a brief introduction to the basics of denoising diffusion probabilistic models (DDPMs) and widely used T2I diffusion models. We then reveal the controlling mechanisms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology · Video Analysis and Summarization

MethodsDiffusion