Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning
Jiajin Liu, Dongzhe Fan, Jiacheng Shen, Chuanhao Ji, Daochen Zha, Qiaoyu Tan

TL;DR
This paper introduces Graph-MLLM, a comprehensive benchmark for multimodal graph learning that evaluates different paradigms of integrating multimodal large language models with graph data across various datasets.
Contribution
It provides a unified benchmark for multimodal graph learning, systematically comparing three paradigms and offering insights into effective strategies and state-of-the-art performance.
Findings
Joint visual and textual attribute consideration improves performance.
Converting visual attributes into textual descriptions enhances results.
Fine-tuning MLLMs yields state-of-the-art performance without explicit graph structure.
Abstract
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in representing and understanding diverse modalities. However, they typically focus on modality alignment in a pairwise manner while overlooking structural relationships across data points. Integrating multimodality with structured graph information (i.e., multimodal graphs, MMGs) is essential for real-world applications such as social networks, healthcare, and recommendation systems. Existing MMG learning methods fall into three paradigms based on how they leverage MLLMs: Encoder, Aligner, and Predictor. MLLM-as-Encoder focuses on enhancing graph neural networks (GNNs) via multimodal feature fusion; MLLM-as-Aligner aligns multimodal attributes in language or hidden space to enable LLM-based graph reasoning; MLLM-as-Predictor treats MLLMs as standalone reasoners with in-context learning or fine-tuning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Topic Modeling
MethodsLib · Focus
