$M^3EL$: A Multi-task Multi-topic Dataset for Multi-modal Entity Linking

Fang Wang; Shenglin Yin; Xiaoying Bai; Minghao Hu; Tianwei Yan; Yi; Liang

arXiv:2410.18096·cs.IR·October 25, 2024

$M^3EL$: A Multi-task Multi-topic Dataset for Multi-modal Entity Linking

Fang Wang, Shenglin Yin, Xiaoying Bai, Minghao Hu, Tianwei Yan, Yi, Liang

PDF

Open Access 1 Video

TL;DR

The paper introduces $M^3EL$, a large-scale multi-modal entity linking dataset covering diverse tasks and topics, and demonstrates its effectiveness in improving model performance through a new training strategy.

Contribution

It presents a novel large-scale dataset for multi-modal entity linking with diverse topics and tasks, along with a modality-augmented training strategy to enhance model generalization.

Findings

01

Existing models perform poorly due to limited data and coverage.

02

$M^3EL$ significantly improves model accuracy across tasks.

03

The proposed training strategy enhances multi-modal model adaptability.

Abstract

Multi-modal Entity Linking (MEL) is a fundamental component for various downstream tasks. However, existing MEL datasets suffer from small scale, scarcity of topic types and limited coverage of tasks, making them incapable of effectively enhancing the entity linking capabilities of multi-modal models. To address these obstacles, we propose a dataset construction pipeline and publish $M^{3} E L$ , a large-scale dataset for MEL. $M^{3} E L$ includes 79,625 instances, covering 9 diverse multi-modal tasks, and 5 different topics. In addition, to further improve the model's adaptability to multi-modal tasks, We propose a modality-augmented training strategy. Utilizing $M^{3} E L$ as a corpus, train the $CLIP_{ND}$ model based on $CLIP (ViT - B - 32)$ , and conduct a comparative analysis with an existing multi-modal baselines. Experimental results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

M^3EL: A Multi-task Multi-topic Dataset for Multi-modal Entity Linking· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques