Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning

Yingjie Zhu; Xuefeng Bai; Kehai Chen; Yang Xiang; Jun Yu; Min Zhang

arXiv:2412.13540·cs.CL·June 9, 2025

Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning

Yingjie Zhu, Xuefeng Bai, Kehai Chen, Yang Xiang, Jun Yu, Min Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces VGCure, a benchmark for evaluating LVLMs on visual graph understanding, and proposes a structure-aware fine-tuning method to improve their reasoning abilities and robustness.

Contribution

The paper presents a comprehensive benchmark for visual graph tasks and a novel fine-tuning framework to enhance LVLMs' structure learning capabilities.

Findings

01

LVLMs perform poorly on graph understanding and reasoning tasks.

02

The proposed fine-tuning improves LVLMs' performance on graph-related tasks.

03

Enhanced models show increased robustness to complex visual graphs.

Abstract

Large Vision-Language Models (LVLMs) have demonstrated remarkable performance across diverse tasks. Despite great success, recent studies show that LVLMs encounter substantial limitations when engaging with visual graphs. To study the reason behind these limitations, we propose VGCure, a comprehensive benchmark covering 22 tasks for examining the fundamental graph understanding and reasoning capacities of LVLMs. Extensive evaluations conducted on 14 LVLMs reveal that LVLMs are weak in basic graph understanding and reasoning tasks, particularly those concerning relational or structurally complex information. Based on this observation, we propose a structure-aware fine-tuning framework to enhance LVLMs with structure learning abilities through three self-supervised learning tasks. Experiments validate the effectiveness of our method in improving LVLMs' performance on fundamental and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aaandy-zhu/vgcure
noneOfficial

Videos

Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Semantic Web and Ontologies · Constraint Satisfaction and Optimization