MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
Srija Mukhopadhyay, Abhishek Rajgaria, Prerana Khatiwada, Vivek Gupta,, Dan Roth

TL;DR
This paper introduces MAPWise, a new benchmark for evaluating vision-language models on complex map-based questions, revealing current limitations and guiding future improvements in spatial reasoning tasks.
Contribution
It presents a novel map question-answering benchmark with diverse maps and questions, facilitating research on VLMs' spatial understanding capabilities.
Findings
VLMs show significant gaps in map-based reasoning.
The benchmark covers diverse geographical regions and question types.
Insights suggest directions for enhancing VLM spatial reasoning.
Abstract
Vision-language models (VLMs) excel at tasks requiring joint understanding of visual and linguistic information. A particularly promising yet under-explored application for these models lies in answering questions based on various kinds of maps. This study investigates the efficacy of VLMs in answering questions based on choropleth maps, which are widely used for data analysis and representation. To facilitate and encourage research in this area, we introduce a novel map-based question-answering benchmark, consisting of maps from three geographical regions (United States, India, China), each containing 1000 questions. Our benchmark incorporates 43 diverse question templates, requiring nuanced understanding of relative spatial relationships, intricate map features, and complex reasoning. It also includes maps with discrete and continuous values, encompassing variations in color-mapping,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Management and Algorithms · Geographic Information Systems Studies · Semantic Web and Ontologies
