VP-MEL: Visual Prompts Guided Multimodal Entity Linking

Hongze Mi; Jinyuan Li; Xuying Zhang; Haoran Cheng; Jiahao Wang; Di; Sun; Gang Pan

arXiv:2412.06720·cs.CV·February 18, 2025

VP-MEL: Visual Prompts Guided Multimodal Entity Linking

Hongze Mi, Jinyuan Li, Xuying Zhang, Haoran Cheng, Jiahao Wang, Di, Sun, Gang Pan

PDF

Open Access

TL;DR

This paper introduces VP-MEL, a novel multimodal entity linking task guided by visual prompts, along with a new dataset and a framework that improves entity retrieval by better utilizing visual information.

Contribution

The paper proposes the VP-MEL task, creates the VPWiki dataset, and develops the IIER framework that enhances visual feature extraction for improved multimodal entity linking.

Findings

01

IIER outperforms baseline methods on VPWiki

02

Visual prompts improve entity linking accuracy

03

New dataset VPWiki facilitates VP-MEL research

Abstract

Multimodal entity linking (MEL), a task aimed at linking mentions within multimodal contexts to their corresponding entities in a knowledge base (KB), has attracted much attention due to its wide applications in recent years. However, existing MEL methods often rely on mention words as retrieval cues, which limits their ability to effectively utilize information from both images and text. This reliance causes MEL to struggle with accurately retrieving entities in certain scenarios, especially when the focus is on image objects or mention words are missing from the text. To solve these issues, we introduce a Visual Prompts guided Multimodal Entity Linking (VP-MEL) task. Given a text-image pair, VP-MEL aims to link a marked region (i.e., visual prompt) in an image to its corresponding entities in the knowledge base. To facilitate this task, we present a new dataset, VPWiki, specifically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Text Analysis Techniques

MethodsSoftmax · Attention Is All You Need · Balanced Selection · ALIGN