Text-to-Image Synthesis: A Decade Survey

Nonghai Zhang; Hao Tang

arXiv:2411.16164·cs.CV·November 26, 2024·3 cites

Text-to-Image Synthesis: A Decade Survey

Nonghai Zhang, Hao Tang

PDF

Open Access

TL;DR

This survey reviews over 440 recent works on text-to-image synthesis, highlighting advances in models like GANs, autoregressive, and diffusion models, and discussing challenges, applications, and future directions in the field.

Contribution

It provides a comprehensive overview of recent developments in T2I, including model evolution, performance metrics, and emerging research challenges and opportunities.

Findings

01

GANs, autoregressive, and diffusion models are key to T2I development.

02

Advances have improved image quality, diversity, and controllability.

03

Current challenges include safety, personalization, and content consistency.

Abstract

When humans read a specific text, they often visualize the corresponding images, and we hope that computers can do the same. Text-to-image synthesis (T2I), which focuses on generating high-quality images from textual descriptions, has become a significant aspect of Artificial Intelligence Generated Content (AIGC) and a transformative direction in artificial intelligence research. Foundation models play a crucial role in T2I. In this survey, we review over 440 recent works on T2I. We start by briefly introducing how GANs, autoregressive models, and diffusion models have been used for image generation. Building on this foundation, we discuss the development of these models for T2I, focusing on their generative capabilities and diversity when conditioned on text. We also explore cutting-edge research on various aspects of T2I, including performance, controllability, personalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction

MethodsDiffusion