Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso M de Melo, Jieneng Chen, Alan Yuille

TL;DR
Spatial457 is a new synthetic benchmark dataset designed to evaluate large multimodal models' 6D spatial reasoning abilities, revealing performance declines with increasing complexity and uncovering attribute biases.
Contribution
The paper introduces Spatial457, a comprehensive synthetic dataset and evaluation framework for 6D spatial reasoning in multimodal models, addressing limitations of existing 2D-focused benchmarks.
Findings
Models show performance decline as task complexity increases.
3D and 6D spatial reasoning are particularly challenging for current models.
Prediction biases are observed across different spatial attributes.
Abstract
Although large multimodal models (LMMs) have demonstrated remarkable capabilities in visual scene interpretation and reasoning, their capacity for complex and precise 3-dimensional spatial reasoning remains uncertain. Existing benchmarks focus predominantly on 2D spatial understanding and lack a framework to comprehensively evaluate 6D spatial reasoning across varying complexities. To address this limitation, we present Spatial457, a scalable and unbiased synthetic dataset designed with 4 key capability for spatial reasoning: multi-object recognition, 2D location, 3D location, and 3D orientation. We develop a cascading evaluation structure, constructing 7 question types across 5 difficulty levels that range from basic single object recognition to our new proposed complex 6D spatial reasoning tasks. We evaluated various large multimodal models (LMMs) on PulseCheck457, observing a general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Speech and dialogue systems · AI-based Problem Solving and Planning
MethodsFocus
