Leveraging Intra-modal and Inter-modal Interaction for Multi-Modal   Entity Alignment

Zhiwei Hu; V\'ictor Guti\'errez-Basulto; Zhiliang Xiang; Ru Li; Jeff; Z. Pan

arXiv:2404.17590·cs.IR·April 30, 2024

Leveraging Intra-modal and Inter-modal Interaction for Multi-Modal Entity Alignment

Zhiwei Hu, V\'ictor Guti\'errez-Basulto, Zhiliang Xiang, Ru Li, Jeff, Z. Pan

PDF

Open Access

TL;DR

This paper introduces MIMEA, a novel multi-grained interaction framework that enhances multi-modal entity alignment by effectively integrating intra- and inter-modal knowledge through multiple modules, improving accuracy on real-world datasets.

Contribution

The paper proposes MIMEA, a comprehensive framework with four modules that facilitate multi-granular interaction and fusion for multi-modal entity alignment, addressing modal heterogeneity challenges.

Findings

01

MIMEA outperforms state-of-the-art methods on two real-world datasets.

02

The probability-guided fusion effectively integrates uni-modal representations.

03

Optimal transport mechanism enhances modal interaction and alignment accuracy.

Abstract

Multi-modal entity alignment (MMEA) aims to identify equivalent entity pairs across different multi-modal knowledge graphs (MMKGs). Existing approaches focus on how to better encode and aggregate information from different modalities. However, it is not trivial to leverage multi-modal knowledge in entity alignment due to the modal heterogeneity. In this paper, we propose a Multi-Grained Interaction framework for Multi-Modal Entity Alignment (MIMEA), which effectively realizes multi-granular interaction within the same modality or between different modalities. MIMEA is composed of four modules: i) a Multi-modal Knowledge Embedding module, which extracts modality-specific representations with multiple individual encoders; ii) a Probability-guided Modal Fusion module, which employs a probability guided approach to integrate uni-modal representations into joint-modal embeddings, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Speech and dialogue systems

MethodsContrastive Learning · Focus