TL;DR
This paper introduces MSR-MEL, a novel unsupervised multimodal entity linking framework that synthesizes and reasons over multi-perspective evidence using large language models and graph neural networks.
Contribution
It proposes a two-stage framework combining evidence synthesis and reasoning, leveraging LLMs and graph neural networks for improved unsupervised MEL performance.
Findings
MSR-MEL outperforms state-of-the-art unsupervised methods on MEL benchmarks.
The framework effectively integrates diverse evidence types for accurate entity linking.
Graph-based evidence aggregation enhances the reasoning process in MEL.
Abstract
Multimodal Entity Linking (MEL) is a fundamental task in data management that maps ambiguous mentions with diverse modalities to the multimodal entities in a knowledge base. However, most existing MEL approaches primarily focus on optimizing instance-centric features and evidence, leaving broader forms of evidence and their intricate interdependencies insufficiently explored. Motivated by the observation that human expert decision-making process relies on multi-perspective judgment, in this work, we propose MSR-MEL, a Multi-perspective Evidence Synthesis and Reasoning framework with Large Language Models (LLMs) for unsupervised MEL. Specifically, we adopt a two-stage framework: (1) Offline Multi-Perspective Evidence Synthesis constructs a comprehensive set of evidence. This includes instance-centric evidence capturing the instance-centric multimodal information of mentions and entities,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
