CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding

Huy Quang Ung; Guillaume Habault; Yasutaka Nishimura; Hao Niu; Roberto Legaspi; Tomoki Oya; Ryoichi Kojima; Masato Taya; Chihiro Ono; Atsunori Minamikawa; Yan Liu

arXiv:2512.03558·cs.CV·December 4, 2025

CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding

Huy Quang Ung, Guillaume Habault, Yasutaka Nishimura, Hao Niu, Roberto Legaspi, Tomoki Oya, Ryoichi Kojima, Masato Taya, Chihiro Ono, Atsunori Minamikawa, Yan Liu

PDF

Open Access

TL;DR

CartoMapQA is a new benchmark dataset designed to evaluate vision-language models' ability to understand and interpret cartographic maps through diverse question-answering tasks, highlighting current challenges and guiding future improvements.

Contribution

This paper introduces CartoMapQA, the first comprehensive dataset for assessing LVLMs' map understanding, covering symbol recognition, information extraction, scale interpretation, and reasoning.

Findings

01

Models struggle with map-specific semantics.

02

Limited geospatial reasoning capabilities.

03

OCR errors significantly affect performance.

Abstract

The rise of Visual-Language Models (LVLMs) has unlocked new possibilities for seamlessly integrating visual and textual information. However, their ability to interpret cartographic maps remains largely unexplored. In this paper, we introduce CartoMapQA, a benchmark specifically designed to evaluate LVLMs' understanding of cartographic maps through question-answering tasks. The dataset includes over 2000 samples, each composed of a cartographic map, a question (with open-ended or multiple-choice answers), and a ground-truth answer. These tasks span key low-, mid- and high-level map interpretation skills, including symbol recognition, embedded information extraction, scale interpretation, and route-based reasoning. Our evaluation of both open-source and proprietary LVLMs reveals persistent challenges: models frequently struggle with map-specific semantics, exhibit limited geospatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications