SpatialTree: How Spatial Abilities Branch Out in MLLMs
Yuxi Xiao, Longfei Li, Shen Yan, Xinhang Liu, Sida Peng, Yunchao Wei, Xiaowei Zhou, Bingyi Kang

TL;DR
SpatialTree introduces a hierarchical framework inspired by cognitive science to evaluate and enhance spatial abilities in multimodal large language models, revealing interdependencies and transfer dynamics across different ability levels.
Contribution
It presents the first capability-centric hierarchy and benchmark for spatial abilities in MLLMs, along with insights into transfer dynamics and a novel RL strategy for improvement.
Findings
L1 skills are largely orthogonal, higher skills are correlated.
Negative transfer occurs within L1, but positive transfer from low to high levels.
Auto-think strategy improves performance across all hierarchy levels.
Abstract
Cognitive science suggests that spatial ability develops progressively-from perception to reasoning and interaction. Yet in multimodal LLMs (MLLMs), this hierarchy remains poorly understood, as most studies focus on a narrow set of tasks. We introduce SpatialTree, a cognitive-science-inspired hierarchy that organizes spatial abilities into four levels: low-level perception (L1), mental mapping (L2), simulation (L3), and agentic competence (L4). Based on this taxonomy, we construct the first capability-centric hierarchical benchmark, thoroughly evaluating mainstream MLLMs across 27 sub-abilities. The evaluation results reveal a clear structure: L1 skills are largely orthogonal, whereas higher-level skills are strongly correlated, indicating increasing interdependency. Through targeted supervised fine-tuning, we uncover a surprising transfer dynamic-negative transfer within L1, but strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial Cognition and Navigation · Visual and Cognitive Learning Processes · Cognitive and developmental aspects of mathematical skills
