EarthSpatialBench: Benchmarking Spatial Reasoning Capabilities of Multimodal LLMs on Earth Imagery

Zelin Xu; Yupu Zhang; Saugat Adhikari; Saiful Islam; Tingsong Xiao; Zibo Liu; Shigang Chen; Da Yan; Zhe Jiang

arXiv:2602.15918·cs.CV·February 19, 2026

EarthSpatialBench: Benchmarking Spatial Reasoning Capabilities of Multimodal LLMs on Earth Imagery

Zelin Xu, Yupu Zhang, Saugat Adhikari, Saiful Islam, Tingsong Xiao, Zibo Liu, Shigang Chen, Da Yan, Zhe Jiang

PDF

Open Access

TL;DR

EarthSpatialBench is a new comprehensive benchmark designed to evaluate the spatial reasoning abilities of multimodal large language models on Earth imagery, covering distance, direction, topology, and complex geometries.

Contribution

It introduces a large-scale, diverse dataset with over 325K questions that assess various aspects of spatial reasoning in Earth imagery, filling gaps left by previous benchmarks.

Findings

01

MLLMs show limitations in quantitative spatial reasoning.

02

Existing models struggle with topological and complex geometry queries.

03

Benchmark reveals specific areas for improvement in spatial understanding.

Abstract

Benchmarking spatial reasoning in multimodal large language models (MLLMs) has attracted growing interest in computer vision due to its importance for embodied AI and other agentic systems that require precise interaction with the physical world. However, spatial reasoning on Earth imagery has lagged behind, as it uniquely involves grounding objects in georeferenced images and quantitatively reasoning about distances, directions, and topological relations using both visual cues and vector geometry coordinates (e.g., 2D bounding boxes, polylines, and polygons). Existing benchmarks for Earth imagery primarily focus on 2D spatial grounding, image captioning, and coarse spatial relations (e.g., simple directional or proximity cues). They lack support for quantitative direction and distance reasoning, systematic topological relations, and complex object geometries beyond bounding boxes. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Spatial Cognition and Navigation