DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image Generation
Zhenxing Zhang, Lambert Schomaker

TL;DR
DTGAN introduces a single-generator, attention-based model for text-to-image synthesis that improves image quality and semantic consistency, reducing complexity and training time compared to multi-stage approaches.
Contribution
The paper proposes DTGAN, a novel single-generator/discriminator framework with attention modules and a new visual loss for improved text-to-image generation.
Findings
Outperforms state-of-the-art multi-stage models on benchmark datasets.
Attention modules effectively localize discriminative regions and capture global visual content.
Enhances image resolution with a new visual loss ensuring vivid shapes and colors.
Abstract
Most existing text-to-image generation methods adopt a multi-stage modular architecture which has three significant problems: 1) Training multiple networks increases the run time and affects the convergence and stability of the generative model; 2) These approaches ignore the quality of early-stage generator images; 3) Many discriminators need to be trained. To this end, we propose the Dual Attention Generative Adversarial Network (DTGAN) which can synthesize high-quality and semantically consistent images only employing a single generator/discriminator pair. The proposed model introduces channel-aware and pixel-aware attention modules that can guide the generator to focus on text-relevant channels and pixels based on the global sentence vector and to fine-tune original feature maps using attention weights. Also, Conditional Adaptive Instance-Layer Normalization (CAdaILN) is presented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's.
