VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang

TL;DR
This paper introduces VisionGraph, a benchmark for evaluating large multimodal models on complex visual graph theory problems, and proposes a Description-Program-Reasoning approach to improve reasoning accuracy, demonstrating GPT-4V's superior performance.
Contribution
The paper is the first to create a benchmark for multimodal graph theory problems and introduces a novel DPR method to enhance LMM reasoning capabilities.
Findings
GPT-4V outperforms Gemini Pro in multi-step reasoning
LMMs have limited perception accuracy for graphical structures
DPR significantly boosts reasoning performance, achieving SOTA results
Abstract
Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context. Yet, a challenging type of visual math lies in the multimodal graph theory problem, which demands that LMMs understand the graphical structures accurately and perform multi-step reasoning on the visual graph. Additionally, exploring multimodal graph theory problems will lead to more effective strategies in fields like biology, transportation, and robotics planning. To step forward in this direction, we are the first to design a benchmark named VisionGraph, used to explore the capabilities of advanced LMMs in solving multimodal graph theory problems. It encompasses eight complex graph problem tasks, from connectivity to shortest path problems. Subsequently, we present a Description-Program-Reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGeographic Information Systems Studies · Multimodal Machine Learning Applications · Semantic Web and Ontologies
