VP-MEL: Visual Prompts Guided Multimodal Entity Linking
Hongze Mi, Jinyuan Li, Xuying Zhang, Haoran Cheng, Jiahao Wang, Di, Sun, Gang Pan

TL;DR
This paper introduces VP-MEL, a novel multimodal entity linking task guided by visual prompts, along with a new dataset and a framework that improves entity retrieval by better utilizing visual information.
Contribution
The paper proposes the VP-MEL task, creates the VPWiki dataset, and develops the IIER framework that enhances visual feature extraction for improved multimodal entity linking.
Findings
IIER outperforms baseline methods on VPWiki
Visual prompts improve entity linking accuracy
New dataset VPWiki facilitates VP-MEL research
Abstract
Multimodal entity linking (MEL), a task aimed at linking mentions within multimodal contexts to their corresponding entities in a knowledge base (KB), has attracted much attention due to its wide applications in recent years. However, existing MEL methods often rely on mention words as retrieval cues, which limits their ability to effectively utilize information from both images and text. This reliance causes MEL to struggle with accurately retrieving entities in certain scenarios, especially when the focus is on image objects or mention words are missing from the text. To solve these issues, we introduce a Visual Prompts guided Multimodal Entity Linking (VP-MEL) task. Given a text-image pair, VP-MEL aims to link a marked region (i.e., visual prompt) in an image to its corresponding entities in the knowledge base. To facilitate this task, we present a new dataset, VPWiki, specifically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Text Analysis Techniques
MethodsSoftmax · Attention Is All You Need · Balanced Selection · ALIGN
