CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and Aggregation
Daohan Su, Hao Liu, Xunkai Li, Yinlin Zhu, Xiong Yongfu, Yi Liu, Hongchao Qin, Rong-Hua Li, Guoren Wang

TL;DR
CAMPA introduces a novel decoupled multimodal graph learning framework that effectively aligns cross-modal information during propagation and aggregation, leading to improved efficiency and performance.
Contribution
It proposes a two-stage alignment mechanism in decoupled MGNNs to address modal conflict, enhancing scalability and semantic consistency.
Findings
CAMPA outperforms existing methods on benchmark datasets.
Decoupled MGNNs with CAMPA are more scalable for large graphs.
The framework maintains efficiency while improving accuracy.
Abstract
Multimodal Graph Neural Networks (MGNNs) have shown strong potential for learning from multimodal attributed graphs, yet most existing approaches rely on tightly coupled architectures that suffer from prohibitive computational overhead. In this paper, we present a systematic empirical analysis showing that decoupled MGNNs are substantially more efficient and scalable for large-scale graph learning. However, we identify a critical bottleneck in existing decoupled pipelines, namely modal conflict, which arises in both the propagation and aggregation stages. Specifically, independent multi-hop diffusion causes cross-modal semantic divergence during propagation, while naive fusion fails to align multi-hop feature trajectories during aggregation, jointly limiting effective representation learning. To address this challenge, we propose CAMPA, a Cross-modal Aligned Multimodal Propagation &…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
