Topic-Guided Attention for Image Captioning
Zhihao Zhu, Zhan Xue, Zejian Yuan

TL;DR
This paper introduces a novel topic-guided attention mechanism for image captioning that leverages image topics as high-level guidance, improving feature selection and achieving state-of-the-art results on the COCO dataset.
Contribution
It proposes a new attention model that integrates image topics as guiding information, with separate networks for features and topics, trained end-to-end.
Findings
Achieves state-of-the-art performance on COCO dataset
Effectively incorporates image topics into attention mechanism
Improves image feature selection for captioning
Abstract
Attention mechanisms have attracted considerable interest in image captioning because of its powerful performance. Existing attention-based models use feedback information from the caption generator as guidance to determine which of the image features should be attended to. A common defect of these attention generation methods is that they lack a higher-level guiding information from the image itself, which sets a limit on selecting the most informative image features. Therefore, in this paper, we propose a novel attention mechanism, called topic-guided attention, which integrates image topics in the attention model as a guiding information to help select the most important image features. Moreover, we extract image features and image topics with separate networks, which can be fine-tuned jointly in an end-to-end manner during training. The experimental results on the benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
