Text-to-Image Generation with Attention Based Recurrent Neural Networks

Tehseen Zia; Shahan Arif; Shakeeb Murtaza; and Mirza Ahsan Ullah

arXiv:2001.06658·cs.CV·January 22, 2020·6 cites

Text-to-Image Generation with Attention Based Recurrent Neural Networks

Tehseen Zia, Shahan Arif, Shakeeb Murtaza, and Mirza Ahsan Ullah

PDF

Open Access

TL;DR

This paper introduces a stable, attention-based recurrent neural network model for text-to-image generation that outperforms previous methods on standard datasets by effectively capturing word-to-pixel dependencies.

Contribution

The authors propose a novel attention-based encoder and autoregressive decoder for stable, high-quality caption-based image generation, addressing limitations of prior latent variable and GAN models.

Findings

01

Outperforms existing approaches on MS COCO and MNIST datasets

02

Generates higher quality images as measured by Structural Similarity Index

03

Demonstrates stable training process with attention-based architecture

Abstract

Conditional image modeling based on textual descriptions is a relatively new domain in unsupervised learning. Previous approaches use a latent variable model and generative adversarial networks. While the formers are approximated by using variational auto-encoders and rely on the intractable inference that can hamper their performance, the latter is unstable to train due to Nash equilibrium based objective function. We develop a tractable and stable caption-based image generation model. The model uses an attention-based encoder to learn word-to-pixel dependencies. A conditional autoregressive based decoder is used for learning pixel-to-pixel dependencies and generating images. Experimentations are performed on Microsoft COCO, and MNIST-with-captions datasets and performance is evaluated by using the Structural Similarity Index. Results show that the proposed model performs better than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications