Image Captioning with Semantic Attention

Quanzeng You; Hailin Jin; Zhaowen Wang; Chen Fang; Jiebo Luo

arXiv:1603.03925·cs.CV·March 15, 2016·247 cites

Image Captioning with Semantic Attention

Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, Jiebo Luo

PDF

Open Access 1 Video

TL;DR

This paper introduces a semantic attention model for image captioning that combines top-down and bottom-up approaches, leading to significant improvements over existing methods on benchmark datasets.

Contribution

It proposes a novel semantic attention mechanism that fuses top-down and bottom-up processes for more accurate image captioning.

Findings

01

Outperforms state-of-the-art methods on COCO and Flickr30K datasets

02

Significantly improves evaluation metrics across benchmarks

03

Demonstrates effective integration of semantic concepts in caption generation

Abstract

Automatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing. Existing approaches are either top-down, which start from a gist of an image and convert it into words, or bottom-up, which come up with words describing various aspects of an image and then combine them. In this paper, we propose a new algorithm that combines both approaches through a model of semantic attention. Our algorithm learns to selectively attend to semantic concept proposals and fuse them into hidden states and outputs of recurrent neural networks. The selection and fusion form a feedback connecting the top-down and bottom-up computation. We evaluate our algorithm on two public benchmarks: Microsoft COCO and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Image Captioning With Semantic Attention· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning