Multi-Tailed, Multi-Headed, Spatial Dynamic Memory refined Text-to-Image   Synthesis

Amrit Diggavi Seshadri; Balaraman Ravindran

arXiv:2110.08143·cs.CV·October 18, 2021

Multi-Tailed, Multi-Headed, Spatial Dynamic Memory refined Text-to-Image Synthesis

Amrit Diggavi Seshadri, Balaraman Ravindran

PDF

Open Access

TL;DR

This paper introduces MSMT-GAN, a novel text-to-image synthesis method that uses multi-tailed, multi-headed, spatial dynamic memory to generate more accurate and detailed images by addressing limitations of previous multi-stage approaches.

Contribution

It proposes a new initial generation stage with word-level image features, a spatial dynamic memory module for refinement, and an iterative multi-headed mechanism for better image quality.

Findings

01

Outperforms previous state-of-the-art on CUB and COCO datasets.

02

Achieves more detailed and accurate image synthesis.

03

Effectively disentangles object attributes at word-level.

Abstract

Synthesizing high-quality, realistic images from text-descriptions is a challenging task, and current methods synthesize images from text in a multi-stage manner, typically by first generating a rough initial image and then refining image details at subsequent stages. However, existing methods that follow this paradigm suffer from three important limitations. Firstly, they synthesize initial images without attempting to separate image attributes at a word-level. As a result, object attributes of initial images (that provide a basis for subsequent refinement) are inherently entangled and ambiguous in nature. Secondly, by using common text-representations for all regions, current methods prevent us from interpreting text in fundamentally different ways at different parts of an image. Different image regions are therefore only allowed to assimilate the same type of information from text at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques