MirrorGAN: Learning Text-to-image Generation by Redescription
Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao

TL;DR
MirrorGAN introduces a novel text-to-image generation framework that emphasizes semantic consistency through a redescription approach, utilizing cascaded attention modules and text regeneration to produce more accurate images from descriptions.
Contribution
The paper proposes MirrorGAN, a new framework that enforces semantic alignment in text-to-image synthesis via a redescription mechanism and cascaded attention modules.
Findings
Outperforms state-of-the-art methods on benchmark datasets
Achieves higher semantic consistency in generated images
Demonstrates effective text-image-text alignment
Abstract
Generating an image from a given text description has two goals: visual realism and semantic consistency. Although significant progress has been made in generating high-quality and visually realistic images using generative adversarial networks, guaranteeing semantic consistency between the text description and visual content remains very challenging. In this paper, we address this problem by proposing a novel global-local attentive and semantic-preserving text-to-image-to-text framework called MirrorGAN. MirrorGAN exploits the idea of learning text-to-image generation by redescription and consists of three modules: a semantic text embedding module (STEM), a global-local collaborative attentive module for cascaded image generation (GLAM), and a semantic text regeneration and alignment module (STREAM). STEM generates word- and sentence-level embeddings. GLAM has a cascaded architecture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
