MyGram: Modality-aware Graph Transformer with Global Distribution for Multi-modal Entity Alignment
Zhifei Li, Ziyue Qin, Xiangyu Luo, Xiaoju Hou, Yue Zhao, Miao Zhang, Zhifang Huang, Kui Xiao, Bing Yang

TL;DR
MyGram is a novel multi-modal entity alignment method that leverages a modality-aware graph transformer with global distribution constraints to improve semantic matching across knowledge graphs.
Contribution
It introduces a modality diffusion learning module and Gram Loss for deep structural understanding and global distribution consistency in multi-modal entity alignment.
Findings
Outperforms baseline models on five datasets
Achieves up to 9.9% improvement in Hits@1
Demonstrates effectiveness of global distribution regularization
Abstract
Multi-modal entity alignment aims to identify equivalent entities between two multi-modal Knowledge graphs by integrating multi-modal data, such as images and text, to enrich the semantic representations of entities. However, existing methods may overlook the structural contextual information within each modality, making them vulnerable to interference from shallow features. To address these challenges, we propose MyGram, a modality-aware graph transformer with global distribution for multi-modal entity alignment. Specifically, we develop a modality diffusion learning module to capture deep structural contextual information within modalities and enable fine-grained multi-modal fusion. In addition, we introduce a Gram Loss that acts as a regularization constraint by minimizing the volume of a 4-dimensional parallelotope formed by multi-modal features, thereby achieving global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning
