Toward Effective Multimodal Graph Foundation Model: A Divide-and-Conquer Based Approach
Sicheng Liu, Xunkai Li, Daohan Su, Ru Zhang, Hongchao Qin, Ronghua Li, Guoren Wang

TL;DR
This paper introduces PLANET, a novel framework for Multimodal Graph Foundation Models that explicitly models modality interaction and alignment using a divide-and-conquer approach, significantly improving performance on various tasks.
Contribution
The paper proposes PLANET, a divide-and-conquer framework that decouples modality interaction and alignment in MGFMs, addressing key limitations of existing models.
Findings
PLANET outperforms state-of-the-art baselines on multiple tasks.
Embedding-wise Domain Gating enhances local cross-modal context.
Node-wise Discretization Retrieval improves global modality alignment.
Abstract
Graph Foundation Models (GFMs) have achieved remarkable success in generalizing across diverse domains. However, they mainly focus on Text-Attributed Graphs (TAGs), leaving Multimodal-Attributed Graphs (MAGs) largely untapped. Developing Multimodal Graph Foundation Models (MGFMs) allows for leveraging the rich multimodal information in MAGs, and extends applicability to broader types of downstream tasks. While recent MGFMs integrate diverse modality information, our empirical investigation reveals two fundamental limitations of existing MGFMs: (1)they fail to explicitly model modality interaction, essential for capturing intricate cross-modal semantics beyond simple aggregation, and (2)they exhibit sub-optimal modality alignment, which is critical for bridging the significant semantic disparity between distinct modal spaces. To address these challenges, we propose PLANET (graPh…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Graph Theory and Algorithms
