MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?
Ziqiao Shang, Lingyue Ge, Zi-Jian Cheng, Shi-Yu Tian, Zhenyu Huang, Wenbo Fu, Weiming Wu, Yang Chen, Xiangwen Zhang, Yulan Hu, Bin Liu, Yu-Feng Li, Lan-Zhe Guo

TL;DR
MapTab is a new multimodal benchmark designed to evaluate large language models' ability to perform multi-criteria route planning using visual and tabular data across diverse global scenarios.
Contribution
The paper introduces MapTab, a comprehensive benchmark for assessing MLLMs' multi-criteria reasoning in route planning tasks with multimodal inputs.
Findings
Current MLLMs struggle with multi-criteria multimodal reasoning.
Multimodal collaboration often underperforms compared to unimodal models under visual limitations.
MapTab offers a challenging testbed for evaluating MLLMs' reasoning capabilities.
Abstract
Systematic evaluation of Multimodal Large Language Models (MLLMs) is crucial for advancing Artificial General Intelligence (AGI). However, existing benchmarks remain insufficient for rigorously assessing their reasoning capabilities under multi-criteria constraints. To bridge this gap, we introduce MapTab, a multimodal benchmark specifically designed to evaluate holistic multi-criteria reasoning in MLLMs via route planning tasks. MapTab requires MLLMs to perceive and ground visual cues from map images alongside route attributes (e.g., Time, Price) from structured tabular data. The benchmark encompasses two scenarios: Metromap, covering metro networks in 160 cities across 52 countries, and Travelmap, depicting 168 representative tourist attractions from 19 countries. In total, MapTab comprises 328 images, 196,800 route planning queries, and 3,936 QA queries, all incorporating 4 key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
