CanvasGAN: A simple baseline for text to image generation by   incrementally patching a canvas

Amanpreet Singh; Sharan Agrawal

arXiv:1810.02833·cs.CV·October 9, 2018

CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas

Amanpreet Singh, Sharan Agrawal

PDF

Open Access

TL;DR

CanvasGAN introduces a recurrent model that incrementally builds images from text captions by attending to specific words and patches, serving as a simple yet effective baseline for text-to-image generation.

Contribution

The paper presents a novel recurrent approach that incrementally constructs images with attention mechanisms and introduces a new self-attention based method for sentence embeddings.

Findings

01

Outperforms previous models as a baseline for text-to-image tasks

02

Uses a novel patch-by-patch image generation process

03

Employs a new self-attention based sentence embedding method

Abstract

We propose a new recurrent generative model for generating images from text captions while attending on specific parts of text captions. Our model creates images by incrementally adding patches on a "canvas" while attending on words from text caption at each timestep. Finally, the canvas is passed through an upscaling network to generate images. We also introduce a new method for generating visual-semantic sentence embeddings based on self-attention over text. We compare our model's generated images with those generated Reed et. al.'s model and show that our model is a stronger baseline for text to image generation tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization