DIM: Dynamic Integration of Multimodal Entity Linking with Large   Language Model

Shezheng Song; Shasha Li; Jie Yu; Shan Zhao; Xiaopeng Li; Jun Ma,; Xiaodong Liu; Zhuo Li; Xiaoguang Mao

arXiv:2407.12019·cs.CL·July 18, 2024

DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma,, Xiaodong Liu, Zhuo Li, Xiaoguang Mao

PDF

Open Access

TL;DR

This paper introduces DIM, a novel approach that leverages large language models like ChatGPT and BLIP-2 for dynamic multimodal entity linking, significantly improving performance on multiple datasets.

Contribution

The paper presents a new method combining dynamic entity extraction with LLM-based visual understanding for enhanced multimodal entity linking.

Findings

01

Outperforms existing methods on three datasets

02

Achieves state-of-the-art results on enhanced datasets

03

Demonstrates effective integration of multimodal information

Abstract

Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dynamically Integrate Multimodal information with knowledge base (DIM), employing the capability of the Large Language Model (LLM) for visual understanding. The LLM, such as BLIP-2, extracts information relevant to entities in the image, which can facilitate improved extraction of entity features and linking them with the dynamic entity representations provided by ChatGPT. The experiments demonstrate that our proposed DIM method outperforms the majority of existing methods on the three original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling