Generating Images from Captions with Attention
Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov

TL;DR
This paper presents an attention-based generative model that creates images from captions, producing higher quality images with novel scene compositions by iteratively drawing patches guided by relevant words.
Contribution
The paper introduces a new attention-driven image generation model that outperforms baselines in quality and scene diversity on captioned datasets.
Findings
Produces higher quality images than baseline models
Generates images with novel scene compositions
Effective in image retrieval tasks
Abstract
Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description. After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstrate that our model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
