Multi-level Matching Network for Multimodal Entity Linking
Zhiwei Hu, V\'ictor Guti\'errez-Basulto, Ru Li, Jeff Z. Pan

TL;DR
This paper introduces M3EL, a novel multi-level matching network for multimodal entity linking that effectively captures intra- and cross-modal interactions, improving performance on multiple datasets.
Contribution
The paper proposes a multi-level matching network with intra-modal contrastive learning and bidirectional cross-modal strategies for enhanced multimodal entity linking.
Findings
Outperforms state-of-the-art baselines on WikiMEL, RichpediaMEL, and WikiDiverse datasets.
Effectively captures intra-modal differences and bidirectional cross-modal interactions.
Demonstrates significant improvements in linking accuracy across multiple datasets.
Abstract
Multimodal entity linking (MEL) aims to link ambiguous mentions within multimodal contexts to corresponding entities in a multimodal knowledge base. Most existing approaches to MEL are based on representation learning or vision-and-language pre-training mechanisms for exploring the complementary effect among multiple modalities. However, these methods suffer from two limitations. On the one hand, they overlook the possibility of considering negative samples from the same modality. On the other hand, they lack mechanisms to capture bidirectional cross-modal interaction. To address these issues, we propose a Multi-level Matching network for Multimodal Entity Linking (M3EL). Specifically, M3EL is composed of three different modules: (i) a Multimodal Feature Extraction module, which extracts modality-specific representations with a multimodal encoder and introduces an intra-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis
MethodsContrastive Learning
