CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas
Amanpreet Singh, Sharan Agrawal

TL;DR
CanvasGAN introduces a recurrent model that incrementally builds images from text captions by attending to specific words and patches, serving as a simple yet effective baseline for text-to-image generation.
Contribution
The paper presents a novel recurrent approach that incrementally constructs images with attention mechanisms and introduces a new self-attention based method for sentence embeddings.
Findings
Outperforms previous models as a baseline for text-to-image tasks
Uses a novel patch-by-patch image generation process
Employs a new self-attention based sentence embedding method
Abstract
We propose a new recurrent generative model for generating images from text captions while attending on specific parts of text captions. Our model creates images by incrementally adding patches on a "canvas" while attending on words from text caption at each timestep. Finally, the canvas is passed through an upscaling network to generate images. We also introduce a new method for generating visual-semantic sentence embeddings based on self-attention over text. We compare our model's generated images with those generated Reed et. al.'s model and show that our model is a stronger baseline for text to image generation tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization
