SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

Peiran Xu; Sudong Wang; Yao Zhu; Jianing Li; Gege Qi; Yunjian Zhang

arXiv:2511.21471·cs.AI·May 8, 2026

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

Peiran Xu, Sudong Wang, Yao Zhu, Jianing Li, Gege Qi, Yunjian Zhang

PDF

1 Datasets

TL;DR

This paper introduces SpatialBench, a comprehensive benchmark and hierarchical framework for evaluating the spatial cognition abilities of multimodal large language models across five levels of complexity.

Contribution

It proposes a hierarchical spatial cognition framework, constructs a detailed benchmark with 15 tasks, and introduces a unified metric for assessing spatial reasoning in MLLMs.

Findings

01

Models excel in perceptual grounding but struggle with symbolic reasoning.

02

Performance varies significantly across different cognitive levels.

03

Humans outperform models in goal-directed spatial abstraction.

Abstract

Spatial cognition is fundamental to real-world multimodal intelligence, allowing models to effectively interact with the physical environment. While multimodal large language models (MLLMs) have made significant strides, existing benchmarks often oversimplify spatial cognition, reducing it to a single-dimensional metric, which fails to capture the hierarchical structure and interdependence of spatial abilities. To address this gap, we propose a hierarchical spatial cognition framework that decomposes spatial intelligence into five progressively complex levels from basic observation to high-level planning. Building upon this taxonomy, we construct SpatialBench, a large-scale, fine-grained benchmark covering 15 tasks aligned with these cognitive levels. To provide a unified evaluation across heterogeneous tasks, we further introduce a high-level capability-oriented metric that reliably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

XPR2004/SpatialBench
dataset· 317 dl
317 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.