SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models
Peiyao Jiang, Zequn Qin, Xi Li

TL;DR
SpatialText is a diagnostic framework that evaluates whether large language models genuinely understand and manipulate internal spatial representations, revealing significant limitations in perspective transformation and local reasoning.
Contribution
The paper introduces SpatialText, a novel benchmark that isolates text-based spatial reasoning and systematically assesses models' internal spatial understanding capabilities.
Findings
Models excel at retrieving explicit spatial facts.
Models operate within global coordinate systems.
Models struggle with egocentric perspective transformation.
Abstract
Genuine spatial reasoning relies on the capacity to construct and manipulate coherent internal spatial representations, often conceptualized as mental models, rather than merely processing surface linguistic associations. While large language models exhibit advanced capabilities across various domains, existing benchmarks fail to isolate this intrinsic spatial cognition from statistical language heuristics. Furthermore, multimodal evaluations frequently conflate genuine spatial reasoning with visual perception. To systematically investigate whether models construct flexible spatial mental models, we introduce SpatialText, a theory-driven diagnostic framework. Rather than functioning simply as a dataset, SpatialText isolates text-based spatial reasoning through a dual-source methodology. It integrates human-annotated descriptions of real 3D indoor environments, which capture natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial Cognition and Navigation · Constraint Satisfaction and Optimization · Multimodal Machine Learning Applications
