Towards Multimodal Graph Large Language Model

Xin Wang; Zeyang Zhang; Linxin Xiao; Haibo Chen; Chendi Ge; Wenwu Zhu

arXiv:2506.09738·cs.LG·November 26, 2025

Towards Multimodal Graph Large Language Model

Xin Wang, Zeyang Zhang, Linxin Xiao, Haibo Chen, Chendi Ge, Wenwu Zhu

PDF

TL;DR

This paper proposes a unified framework for Multi-modal Graph Large Language Models (MG-LLM) to enhance generalization across diverse multi-modal graph data and tasks, emphasizing multi-granularity, multi-scale features, and natural language interaction.

Contribution

It introduces five key characteristics for MG-LLM, discusses challenges, reviews related work, and outlines future research directions for multi-modal graph learning.

Findings

01

Identifies five desired characteristics for MG-LLM

02

Highlights challenges and future directions in multi-modal graph learning

03

Summarizes relevant datasets for training MG-LLM

Abstract

Multi-modal graphs, which integrate diverse multi-modal features and relations, are ubiquitous in real-world applications. However, existing multi-modal graph learning methods are typically trained from scratch for specific graph data and tasks, failing to generalize across various multi-modal graph data and tasks. To bridge this gap, we explore the potential of Multi-modal Graph Large Language Models (MG-LLM) to unify and generalize across diverse multi-modal graph data and tasks. We propose a unified framework of multi-modal graph data, task, and model, discovering the inherent multi-granularity and multi-scale characteristics in multi-modal graphs. Specifically, we present five key desired characteristics for MG-LLM: 1) unified space for multi-modal structures and attributes, 2) capability of handling diverse multi-modal graph tasks, 3) multi-modal graph in-context learning, 4)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.