ReasonMap: Towards Fine-Grained Visual Reasoning from Transit Maps
Sicheng Feng, Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang, Xinchao Wang

TL;DR
ReasonMap introduces a new benchmark with transit maps and questions to evaluate multimodal models' visual reasoning, revealing insights into model performance and grounding requirements.
Contribution
The paper presents ReasonMap, a novel high-resolution transit map benchmark with a two-level evaluation pipeline for assessing multimodal large language models' reasoning capabilities.
Findings
Open-source base models outperform reasoning-tuned variants.
Closed-source models show better reasoning after tuning.
Strong visual grounding is essential for high performance.
Abstract
Multimodal large language models (MLLMs) have demonstrated significant progress in semantic scene understanding and text-image alignment, with reasoning variants enhancing performance on more complex tasks involving mathematics and logic. To bridge this gap, we introduce ReasonMap, a novel benchmark specifically designed to evaluate these capabilities. ReasonMap encompasses high-resolution transit maps from 30 cities and includes 1,008 question-answer pairs spanning two question types and three templates. Furthermore, we design a two-level evaluation pipeline that properly assesses answer correctness and quality. Our comprehensive evaluation of 16 popular MLLMs reveals a counterintuitive pattern: among open-source models, base variants outperform their reasoning-tuned counterparts, whereas the opposite trend is observed in closed-source models. Further analysis under the visual-masking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Constraint Satisfaction and Optimization · Multimodal Machine Learning Applications
MethodsBalanced Selection
