DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

Ming Tao; Hao Tang; Fei Wu; Xiao-Yuan Jing; Bing-Kun Bao; Changsheng; Xu

arXiv:2008.05865·cs.CV·October 18, 2022·26 cites

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng, Xu

PDF

Open Access 3 Repos

TL;DR

DF-GAN introduces a simplified, one-stage text-to-image synthesis model that improves realism and semantic consistency without complex architectures or extra networks, outperforming current state-of-the-art methods.

Contribution

The paper presents a novel one-stage backbone, a target-aware discriminator, and a deep fusion block, making text-to-image synthesis more efficient and effective.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets.

02

Synthesizes high-resolution, realistic, and text-matching images.

03

Simplifies the architecture while enhancing performance.

Abstract

Synthesizing high-quality realistic images from text descriptions is a challenging task. Existing text-to-image Generative Adversarial Networks generally employ a stacked architecture as the backbone yet still remain three flaws. First, the stacked architecture introduces the entanglements between generators of different image scales. Second, existing studies prefer to apply and fix extra networks in adversarial learning for text-image semantic consistency, which limits the supervision capability of these networks. Third, the cross-modal attention-based text-image fusion that widely adopted by previous works is limited on several special image scales because of the computational cost. To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN). To be specific, we propose: (i) a novel one-stage text-to-image backbone that directly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Digital Media Forensic Detection