Multi-level Mixture of Experts for Multimodal Entity Linking

Zhiwei Hu; V\'ictor Guti\'errez-Basulto; Zhiliang Xiang; Ru Li; Jeff Z. Pan

arXiv:2507.07108·cs.CV·July 11, 2025

Multi-level Mixture of Experts for Multimodal Entity Linking

Zhiwei Hu, V\'ictor Guti\'errez-Basulto, Zhiliang Xiang, Ru Li, Jeff Z. Pan

PDF

Open Access

TL;DR

This paper introduces a Multi-level Mixture of Experts model for Multimodal Entity Linking, effectively addressing mention ambiguity and dynamic modal content selection to improve linking accuracy.

Contribution

It proposes a novel MMoE framework with mention enhancement and dynamic feature selection modules for improved multimodal entity linking.

Findings

01

Outperforms state-of-the-art MEL methods in experiments

02

Effectively mitigates mention ambiguity issues

03

Dynamically selects relevant modal features

Abstract

Multimodal Entity Linking (MEL) aims to link ambiguous mentions within multimodal contexts to associated entities in a multimodal knowledge base. Existing approaches to MEL introduce multimodal interaction and fusion mechanisms to bridge the modality gap and enable multi-grained semantic matching. However, they do not address two important problems: (i) mention ambiguity, i.e., the lack of semantic content caused by the brevity and omission of key information in the mention's textual context; (ii) dynamic selection of modal content, i.e., to dynamically distinguish the importance of different parts of modal information. To mitigate these issues, we propose a Multi-level Mixture of Experts (MMoE) model for MEL. MMoE has four components: (i) the description-aware mention enhancement module leverages large language models to identify the WikiData descriptions that best match a mention,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Sentiment Analysis and Opinion Mining