Topic-Guided Attention for Image Captioning

Zhihao Zhu; Zhan Xue; Zejian Yuan

arXiv:1807.03514·cs.CV·July 11, 2018

Topic-Guided Attention for Image Captioning

Zhihao Zhu, Zhan Xue, Zejian Yuan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel topic-guided attention mechanism for image captioning that leverages image topics as high-level guidance, improving feature selection and achieving state-of-the-art results on the COCO dataset.

Contribution

It proposes a new attention model that integrates image topics as guiding information, with separate networks for features and topics, trained end-to-end.

Findings

01

Achieves state-of-the-art performance on COCO dataset

02

Effectively incorporates image topics into attention mechanism

03

Improves image feature selection for captioning

Abstract

Attention mechanisms have attracted considerable interest in image captioning because of its powerful performance. Existing attention-based models use feedback information from the caption generator as guidance to determine which of the image features should be attended to. A common defect of these attention generation methods is that they lack a higher-level guiding information from the image itself, which sets a limit on selecting the most informative image features. Therefore, in this paper, we propose a novel attention mechanism, called topic-guided attention, which integrates image topics in the attention model as a guiding information to help select the most important image features. Moreover, we extract image features and image topics with separate networks, which can be fine-tuned jointly in an end-to-end manner during training. The experimental results on the benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jsaikmr/Building-a-Topic-Modeling-for-Images-using-LDA-and-Transfer-Learning
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques