Mario: Multimodal Graph Reasoning with Large Language Models

Yuanfu Sun; Kang Li; Pengkang Guo; Jiajin Liu; Qiaoyu Tan

arXiv:2603.05181·cs.CV·March 27, 2026

Mario: Multimodal Graph Reasoning with Large Language Models

Yuanfu Sun, Kang Li, Pengkang Guo, Jiajin Liu, Qiaoyu Tan

PDF

Open Access

TL;DR

Mario introduces a novel framework for multimodal graph reasoning that leverages large language models to better understand complex relationships in image-text data, addressing cross-modal consistency and heterogeneity.

Contribution

The paper presents a unified approach combining graph-conditioned vision-language modeling and modality-adaptive instruction tuning for improved multimodal graph reasoning.

Findings

01

Mario outperforms state-of-the-art models in node classification.

02

Mario achieves superior results in link prediction tasks.

03

The framework is effective in both supervised and zero-shot scenarios.

Abstract

Recent advances in large language models (LLMs) have opened new avenues for multimodal reasoning. Yet, most existing methods still rely on pretrained vision-language models (VLMs) to encode image-text pairs in isolation, ignoring the relational structure that real-world multimodal data naturally form. This motivates reasoning on multimodal graphs (MMGs), where each node has textual and visual attributes and edges provide structural cues. Enabling LLM-based reasoning on such heterogeneous multimodal signals while preserving graph topology introduces two key challenges: resolving weak cross-modal consistency and handling heterogeneous modality preference. To address this, we propose Mario, a unified framework that simultaneously resolves the two above challenges and enables effective LLM-based reasoning over MMGs. Mario consists of two innovative stages. Firstly, a graph-conditioned VLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling