Text-to-image Diffusion Models in Generative AI: A Survey
Chenshuang Zhang, Chaoning Zhang, Mengchun Zhang, In So Kweon, Junmo, Kim

TL;DR
This survey comprehensively reviews the development, methods, applications, challenges, and future directions of text-to-image diffusion models in generative AI, highlighting their evolution and expanding use cases.
Contribution
It provides an organized overview of pioneering methods, improvements, and applications of text-to-image diffusion models, along with discussions on challenges and future prospects.
Findings
Summarizes key advancements in text-to-image diffusion models.
Highlights applications beyond image generation, such as video and editing.
Discusses challenges and future research directions.
Abstract
This survey reviews the progress of diffusion models in generating images from text, ~\textit{i.e.} text-to-image diffusion models. As a self-contained work, this survey starts with a brief introduction of how diffusion models work for image synthesis, followed by the background for text-conditioned image synthesis. Based on that, we present an organized review of pioneering methods and their improvements on text-to-image generation. We further summarize applications beyond image generation, such as text-guided generation for various modalities like videos, and text-guided image editing. Beyond the progress made so far, we discuss existing challenges and promising future directions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games
MethodsDiffusion
