MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
Zhiheng Song, Jingshuai Zhang, Chuan Qin, Chao Wang, Chao Chen, Longfei Xu, Kaikui Liu, Xiangxiang Chu, Hengshu Zhu

TL;DR
MobilityBench is a comprehensive, reproducible benchmark designed to evaluate large language model-based route-planning agents in real-world scenarios, highlighting current strengths and weaknesses in personalized mobility planning.
Contribution
This work introduces MobilityBench, a scalable, reproducible benchmark with a multi-dimensional evaluation protocol for assessing LLM-based route-planning agents in real-world settings.
Findings
Models perform well on basic route information retrieval.
Models struggle with preference-constrained route planning.
Benchmark and toolkit are publicly available for further research.
Abstract
Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse routing demands, non-deterministic mapping services, and limited reproducibility. In this study, we introduce MobilityBench, a scalable benchmark for evaluating LLM-based route-planning agents in real-world mobility scenarios. MobilityBench is constructed from large-scale, anonymized real user queries collected from Amap and covers a broad spectrum of route-planning intents across multiple cities worldwide. To enable reproducible, end-to-end evaluation, we design a deterministic API-replay sandbox that eliminates environmental variance from live services. We further propose a multi-dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation and Mobility Innovations · Human Mobility and Location-Based Analysis · Mobile Crowdsensing and Crowdsourcing
