Attention Correctness in Neural Image Captioning

Chenxi Liu; Junhua Mao; Fei Sha; Alan Yuille

arXiv:1605.09553·cs.CV·November 24, 2016·73 cites

Attention Correctness in Neural Image Captioning

Chenxi Liu, Junhua Mao, Fei Sha, Alan Yuille

PDF

Open Access

TL;DR

This paper evaluates and enhances the correctness of attention mechanisms in neural image captioning by introducing a quantitative metric and supervised training methods, leading to improved attention and caption quality.

Contribution

It proposes a new metric for attention correctness and introduces supervised training approaches to improve attention in image captioning models.

Findings

01

Supervised attention training improves attention correctness.

02

Enhanced attention leads to better caption quality.

03

Quantitative evaluation correlates attention correctness with caption performance.

Abstract

Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision. But despite their popularity, the "correctness" of the implicitly-learned attention maps has only been assessed qualitatively by visualization of several examples. In this paper we focus on evaluating and improving the correctness of attention in neural image captioning models. Specifically, we propose a quantitative evaluation metric for the consistency between the generated attention maps and human annotations, using recently released datasets with alignment between regions in images and entities in captions. We then propose novel models with different levels of explicit supervision for learning attention maps during training. The supervision can be strong when alignment between regions and caption entities are available, or weak when only object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling