SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence

Haoning Wu; Xiao Huang; Yaohui Chen; Ya Zhang; Yanfeng Wang; Weidi Xie

arXiv:2505.17012·cs.CV·April 14, 2026

SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence

Haoning Wu, Xiao Huang, Yaohui Chen, Ya Zhang, Yanfeng Wang, Weidi Xie

PDF

1 Repo 4 Models 1 Datasets

TL;DR

This paper introduces SpatialScore, a comprehensive benchmark for evaluating multimodal large language models' spatial intelligence, along with datasets and a multi-agent system to improve spatial reasoning capabilities.

Contribution

It presents the most diverse spatial intelligence benchmark to date, evaluates models extensively, and offers new datasets and a multi-agent system to enhance spatial reasoning without additional training.

Findings

01

49 models evaluated revealing persistent challenges

02

SpatialCorpus improves model performance significantly

03

SpatialAgent enhances reasoning without extra training

Abstract

Existing evaluations of multimodal large language models (MLLMs) on spatial intelligence are typically fragmented and limited in scope. In this work, we aim to conduct a holistic assessment of the spatial understanding capabilities of modern MLLMs and propose complementary data-driven and agent-based solutions. Specifically, we make the following contributions: (i) we introduce SpatialScore, to our knowledge, the most comprehensive and diverse benchmark for multimodal spatial intelligence to date. It covers multiple visual data types, input modalities, and question-answering formats, and contains approximately 5K manually verified samples spanning 30 distinct tasks; (ii) using SpatialScore, we extensively evaluate 49 representative MLLMs, revealing persistent challenges and a substantial gap between current models and human-level spatial intelligence; (iii) to advance model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoningwu3639/SpatialScore
github

Models

Datasets

haoningwu/SpatialScore
dataset· 299 dl
299 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.