Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

Wentian Zhao; Yao Hu; Heda Wang; Xinxiao Wu; Jiebo Luo

arXiv:2107.11970·cs.CV·July 27, 2021

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

Wentian Zhao, Yao Hu, Heda Wang, Xinxiao Wu, Jiebo Luo

PDF

TL;DR

This paper introduces a multi-modal knowledge graph approach to improve entity-aware image captioning by linking visual objects with named entities and their relationships using external web knowledge.

Contribution

It proposes a novel multi-modal knowledge graph construction and integration method that enhances the association between visual cues and entities for better captioning.

Findings

01

Significant improvement on GoodNews dataset

02

Effective cross-modal entity matching module

03

Enhanced caption quality with richer entity information

Abstract

Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article. This task remains challenging as it is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities. Furthermore, the complexity of the article brings difficulty in extracting fine-grained relationships between entities to generate informative event descriptions about the image. To tackle these challenges, we propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities and capture the relationship between entities simultaneously with the help of external knowledge collected from the web. Specifically, we build a text sub-graph by extracting named entities and their relationships from the article, and build…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.