Enhancing Multimodal Entity Linking with Jaccard Distance-based   Conditional Contrastive Learning and Contextual Visual Augmentation

Cong-Duy Nguyen; Xiaobao Wu; Thong Nguyen; Shuai Zhao; Khoi Le,; Viet-Anh Nguyen; Feng Yichao; Anh Tuan Luu

arXiv:2501.14166·cs.CV·January 27, 2025

Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation

Cong-Duy Nguyen, Xiaobao Wu, Thong Nguyen, Shuai Zhao, Khoi Le,, Viet-Anh Nguyen, Feng Yichao, Anh Tuan Luu

PDF

Open Access 1 Video

TL;DR

This paper introduces JD-CCL and CVaCPT, novel methods that improve multimodal entity linking by selecting challenging negative samples and enhancing visual representations, leading to more robust and accurate models.

Contribution

The paper proposes JD-CCL and CVaCPT, innovative techniques that address limitations in contrastive learning and visual variation handling in multimodal entity linking.

Findings

01

Significant improvement on benchmark MEL datasets

02

Enhanced robustness against easy negative samples

03

Better visual representation through multi-view synthetic images

Abstract

Previous research on multimodal entity linking (MEL) has primarily employed contrastive learning as the primary objective. However, using the rest of the batch as negative samples without careful consideration, these studies risk leveraging easy features and potentially overlook essential details that make entities unique. In this work, we propose JD-CCL (Jaccard Distance-based Conditional Contrastive Learning), a novel approach designed to enhance the ability to match multimodal entity linking models. JD-CCL leverages meta-information to select negative samples with similar attributes, making the linking task more challenging and robust. Additionally, to address the limitations caused by the variations within the visual modality among mentions and entities, we introduce a novel method, CVaCPT (Contextual Visual-aid Controllable Patch Transform). It enhances visual representations by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation· underline

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsContrastive Learning