Multi-Tailed, Multi-Headed, Spatial Dynamic Memory refined Text-to-Image Synthesis
Amrit Diggavi Seshadri, Balaraman Ravindran

TL;DR
This paper introduces MSMT-GAN, a novel text-to-image synthesis method that uses multi-tailed, multi-headed, spatial dynamic memory to generate more accurate and detailed images by addressing limitations of previous multi-stage approaches.
Contribution
It proposes a new initial generation stage with word-level image features, a spatial dynamic memory module for refinement, and an iterative multi-headed mechanism for better image quality.
Findings
Outperforms previous state-of-the-art on CUB and COCO datasets.
Achieves more detailed and accurate image synthesis.
Effectively disentangles object attributes at word-level.
Abstract
Synthesizing high-quality, realistic images from text-descriptions is a challenging task, and current methods synthesize images from text in a multi-stage manner, typically by first generating a rough initial image and then refining image details at subsequent stages. However, existing methods that follow this paradigm suffer from three important limitations. Firstly, they synthesize initial images without attempting to separate image attributes at a word-level. As a result, object attributes of initial images (that provide a basis for subsequent refinement) are inherently entangled and ambiguous in nature. Secondly, by using common text-representations for all regions, current methods prevent us from interpreting text in fundamentally different ways at different parts of an image. Different image regions are therefore only allowed to assimilate the same type of information from text at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
