Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation

Shuo Lu; Jianjie Cheng; Yinuo Xu; Yongcan Yu; Lijun Sheng; Peijie Wang; Siru Jiang; Yongguan Hu; Run Ling; Yihua Shao; Ao Ma; Wei Feng; Lingxiao He; Meng Wang; Qianlong Xie; Xingxing Wang; Nicu Sebe; Ran He; and Jian Liang

arXiv:2602.11635·cs.AI·April 9, 2026

Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation

Shuo Lu, Jianjie Cheng, Yinuo Xu, Yongcan Yu, Lijun Sheng, Peijie Wang, Siru Jiang, Yongguan Hu, Run Ling, Yihua Shao, Ao Ma, Wei Feng, Lingxiao He, Meng Wang, Qianlong Xie, Xingxing Wang, Nicu Sebe, Ran He, and Jian Liang

PDF

1 Repo

TL;DR

This paper evaluates the spatial reasoning abilities of multimodal large language models, introduces a comprehensive dataset for this task, and demonstrates that current models significantly lag behind human performance.

Contribution

It presents MathSpatial, the first large-scale dataset specifically designed to assess and improve mathematical spatial reasoning in MLLMs.

Findings

01

Most MLLMs perform poorly on spatial reasoning tasks, with GPT-5 lagging 35% behind humans.

02

Training on MathSpatial-Corpus improves models' spatial reasoning abilities.

03

MathSpatial is publicly available for further research.

Abstract

Multimodal large language models (MLLMs) have achieved strong performance on perception-oriented tasks, yet their ability to perform mathematical spatial reasoning, defined as the capacity to parse and manipulate two- and three-dimensional relations, remains unclear. Humans easily solve textbook-style spatial reasoning problems with over 95\% accuracy, but we find that most leading MLLMs fail to reach even 60\% on the same tasks. This striking gap highlights spatial reasoning as a fundamental weakness of current models. To investigate this gap, we present \emph{MathSpatial}, the first large-scale and systematic dataset resource dedicated to mathematical spatial reasoning in MLLMs. \emph{MathSpatial} provides two complementary subsets: (i)~\emph{MathSpatial-Bench}, a rigorously curated evaluation set of 2{,}000 problems spanning 3 categories and 11 subtypes, designed to isolate spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://shuolucs.github.io/MathSpatial
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.