Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning

Jiajin Liu; Dongzhe Fan; Jiacheng Shen; Chuanhao Ji; Daochen Zha; Qiaoyu Tan

arXiv:2506.10282·cs.LG·June 13, 2025

Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning

Jiajin Liu, Dongzhe Fan, Jiacheng Shen, Chuanhao Ji, Daochen Zha, Qiaoyu Tan

PDF

Open Access

TL;DR

This paper introduces Graph-MLLM, a comprehensive benchmark for multimodal graph learning that evaluates different paradigms of integrating multimodal large language models with graph data across various datasets.

Contribution

It provides a unified benchmark for multimodal graph learning, systematically comparing three paradigms and offering insights into effective strategies and state-of-the-art performance.

Findings

01

Joint visual and textual attribute consideration improves performance.

02

Converting visual attributes into textual descriptions enhances results.

03

Fine-tuning MLLMs yields state-of-the-art performance without explicit graph structure.

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in representing and understanding diverse modalities. However, they typically focus on modality alignment in a pairwise manner while overlooking structural relationships across data points. Integrating multimodality with structured graph information (i.e., multimodal graphs, MMGs) is essential for real-world applications such as social networks, healthcare, and recommendation systems. Existing MMG learning methods fall into three paradigms based on how they leverage MLLMs: Encoder, Aligner, and Predictor. MLLM-as-Encoder focuses on enhancing graph neural networks (GNNs) via multimodal feature fusion; MLLM-as-Aligner aligns multimodal attributes in language or hidden space to enable LLM-based graph reasoning; MLLM-as-Predictor treats MLLMs as standalone reasoners with in-context learning or fine-tuning.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Topic Modeling

MethodsLib · Focus