Generating Images from Captions with Attention

Elman Mansimov; Emilio Parisotto; Jimmy Lei Ba; Ruslan Salakhutdinov

arXiv:1511.02793·cs.LG·March 1, 2016·ICLR·75 cites

Generating Images from Captions with Attention

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov

PDF

Open Access 2 Repos

TL;DR

This paper presents an attention-based generative model that creates images from captions, producing higher quality images with novel scene compositions by iteratively drawing patches guided by relevant words.

Contribution

The paper introduces a new attention-driven image generation model that outperforms baselines in quality and scene diversity on captioned datasets.

Findings

01

Produces higher quality images than baseline models

02

Generates images with novel scene compositions

03

Effective in image retrieval tasks

Abstract

Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description. After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstrate that our model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization