RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios
Jun Zhang, Jie Feng, Long Chen, Junhui Wang, Zhicheng Liu, Depeng Jin, Yong Li

TL;DR
RoadBench is a comprehensive benchmark designed to evaluate multimodal large language models' fine-grained spatial understanding and reasoning in urban road scenarios, revealing significant gaps in current models' capabilities.
Contribution
This work introduces RoadBench, a systematic benchmark with 9,121 test cases focusing on urban road markings and traffic systems to evaluate MLLMs' spatial reasoning.
Findings
Existing MLLMs perform poorly on fine-grained urban spatial tasks.
Many MLLMs underperform compared to simple rule-based or random baselines.
RoadBench exposes critical shortcomings in current models' urban spatial understanding.
Abstract
Multimodal large language models (MLLMs) have demonstrated powerful capabilities in general spatial understanding and reasoning. However, their fine-grained spatial understanding and reasoning capabilities in complex urban scenarios have not received significant attention in the fields of both research and industry. To fill this gap, we focus primarily on road markings as a typical example of fine-grained spatial elements under urban scenarios, given the essential role of the integrated road traffic network they form within cities. Around road markings and urban traffic systems, we propose RoadBench, a systematic benchmark that comprehensively evaluates MLLMs' fine-grained spatial understanding and reasoning capabilities using BEV and FPV image inputs. This benchmark comprises six tasks consisting of 9,121 strictly manually verified test cases. These tasks form a systematic evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Automated Road and Building Extraction · Geographic Information Systems Studies
