TL;DR
This paper introduces DAE-GAN, a novel text-to-image synthesis model that effectively utilizes aspect information at multiple granularities to improve image realism and semantic accuracy.
Contribution
The paper proposes a dynamic aspect-aware GAN with a new multi-granularity text representation and an aspect-aware refinement module for enhanced image synthesis.
Findings
Outperforms existing methods on CUB-200 and COCO datasets.
Effectively incorporates aspect information for detailed image generation.
Achieves higher semantic consistency and visual quality.
Abstract
Text-to-image synthesis refers to generating an image from a given text description, the key goal of which lies in photo realism and semantic consistency. Previous methods usually generate an initial image with sentence embedding and then refine it with fine-grained word embedding. Despite the significant progress, the 'aspect' information (e.g., red eyes) contained in the text, referring to several words rather than a word that depicts 'a particular part or feature of something', is often ignored, which is highly helpful for synthesizing image details. How to make better utilization of aspect information in text-to-image synthesis still remains an unresolved challenge. To address this problem, in this paper, we propose a Dynamic Aspect-awarE GAN (DAE-GAN) that represents text information comprehensively from multiple granularities, including sentence-level, word-level, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
