Multimodal Graph Representation Learning with Dynamic Information Pathways
Xiaobin Hong, Mingkai Lin, Xiaoli Wang, Chaoqun Wang, Wenzhong Li

TL;DR
This paper introduces DiP, a novel framework for multimodal graph learning that uses dynamic message routing and modality-specific pseudo nodes to improve flexibility and performance in heterogeneous graph tasks.
Contribution
The paper proposes a dynamic information pathway framework with pseudo nodes for adaptive, expressive, and efficient multimodal graph learning, surpassing static and dense attention methods.
Findings
DiP outperforms baseline methods on multiple benchmarks.
The approach achieves linear complexity in message propagation.
Experimental results validate the effectiveness of dynamic message routing.
Abstract
Multimodal graphs, where nodes contain heterogeneous features such as images and text, are increasingly common in real-world applications. Effectively learning on such graphs requires both adaptive intra-modal message passing and efficient inter-modal aggregation. However, most existing approaches to multimodal graph learning are typically extended from conventional graph neural networks and rely on static structures or dense attention, which limit flexibility and expressive node embedding learning. In this paper, we propose a novel multimodal graph representation learning framework with Dynamic information Pathways (DiP). By introducing modality-specific pseudo nodes, DiP enables dynamic message routing within each modality via proximity-guided pseudo-node interactions and captures inter-modality dependence through efficient information pathways in a shared state space. This design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Multimodal Machine Learning Applications
