Boost Image Captioning with Knowledge Reasoning

Feicheng Huang; Zhixin Li; Haiyang Wei; Canlong Zhang; Huifang Ma

arXiv:2011.00927·cs.CV·November 3, 2020

Boost Image Captioning with Knowledge Reasoning

Feicheng Huang, Zhixin Li, Haiyang Wei, Canlong Zhang, Huifang Ma

PDF

Open Access

TL;DR

This paper enhances image captioning by introducing word attention for better visual focus and integrating external knowledge from knowledge graphs, resulting in more accurate and meaningful descriptions that outperform existing methods.

Contribution

It proposes a novel word attention mechanism and knowledge graph integration to improve caption quality in image captioning models.

Findings

01

Achieves state-of-the-art performance on COCO and Flickr30k datasets.

02

Outperforms many existing image captioning approaches.

03

Demonstrates the effectiveness of knowledge reasoning in caption generation.

Abstract

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping relationships between words in sentence and regions in image, such unpredictable matching manner sometimes causes inharmonious alignments that may reduce the quality of generated captions. In this paper, we make our efforts to reason about more accurate and meaningful captions. We first propose word attention to improve the correctness of visual attention when generating sequential descriptions word-by-word. The special word attention emphasizes on word importance when focusing on different regions of the input image, and makes full use of the internal annotation knowledge to assist the calculation of visual attention. Then, in order to reveal those…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques