Adversarial Text-to-Image Synthesis: A Review
Stanislav Frolov, Tobias Hinz, Federico Raue, J\"orn Hees, Andreas, Dengel

TL;DR
This review paper discusses the progress, challenges, and future directions of adversarial text-to-image synthesis, emphasizing the importance of evaluation metrics, dataset quality, and architectural improvements.
Contribution
It provides a comprehensive taxonomy, critical analysis of current evaluation strategies, and identifies key research gaps in the field of adversarial text-to-image synthesis.
Findings
Significant progress in visual realism, diversity, and semantic alignment.
Identification of challenges in high-resolution multi-object image generation.
Highlighting the need for better evaluation metrics and datasets.
Abstract
With the advent of generative adversarial networks, synthesizing images from textual descriptions has recently become an active research area. It is a flexible and intuitive way for conditional image generation with significant progress in the last years regarding visual realism, diversity, and semantic alignment. However, the field still faces several challenges that require further research efforts such as enabling the generation of high-resolution images with multiple objects, and developing suitable and reliable evaluation metrics that correlate with human judgement. In this review, we contextualize the state of the art of adversarial text-to-image synthesis models, their development since their inception five years ago, and propose a taxonomy based on the level of supervision. We critically examine current strategies to evaluate text-to-image synthesis models, highlight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
