Text-to-image Diffusion Models in Generative AI: A Survey

Chenshuang Zhang; Chaoning Zhang; Mengchun Zhang; In So Kweon; Junmo; Kim

arXiv:2303.07909·cs.CV·November 11, 2024·76 cites

Text-to-image Diffusion Models in Generative AI: A Survey

Chenshuang Zhang, Chaoning Zhang, Mengchun Zhang, In So Kweon, Junmo, Kim

PDF

Open Access

TL;DR

This survey comprehensively reviews the development, methods, applications, challenges, and future directions of text-to-image diffusion models in generative AI, highlighting their evolution and expanding use cases.

Contribution

It provides an organized overview of pioneering methods, improvements, and applications of text-to-image diffusion models, along with discussions on challenges and future prospects.

Findings

01

Summarizes key advancements in text-to-image diffusion models.

02

Highlights applications beyond image generation, such as video and editing.

03

Discusses challenges and future research directions.

Abstract

This survey reviews the progress of diffusion models in generating images from text, ~\textit{i.e.} text-to-image diffusion models. As a self-contained work, this survey starts with a brief introduction of how diffusion models work for image synthesis, followed by the background for text-conditioned image synthesis. Based on that, we present an organized review of pioneering methods and their improvements on text-to-image generation. We further summarize applications beyond image generation, such as text-guided generation for various modalities like videos, and text-guided image editing. Beyond the progress made so far, we discuss existing challenges and promising future directions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games

MethodsDiffusion