Grid Spatial Understanding: A Dataset for Textual Spatial Reasoning over Grids, Embodied Settings, and Coordinate Structures
Risham Sidhu, Julia Hockenmaier

TL;DR
This paper introduces GSU, a dataset for evaluating large language models' spatial reasoning over grids, focusing on navigation, object localization, and structure composition without visual inputs.
Contribution
The paper presents GSU, a novel text-only dataset for assessing spatial reasoning in LLMs across core tasks, highlighting current model limitations and potential fine-tuning solutions.
Findings
Models understand basic grid concepts but struggle with reference frames.
Visual modality exposure does not improve 3D space understanding.
Fine-tuning small models can approach frontier model performance.
Abstract
We introduce GSU, a text-only grid dataset to evaluate the spatial reasoning capabilities of LLMs over 3 core tasks: navigation, object localization, and structure composition. By forgoing visual inputs, isolating spatial reasoning from perception, we show that while most models grasp basic grid concepts, they struggle with frames of reference relative to an embodied agent and identifying 3D shapes from coordinate lists. We also find that exposure to a visual modality does not provide a generalizable understanding of 3D space that VLMs are able to utilize for these tasks. Finally, we show that while the very latest frontier models can solve the provided tasks (though harder variants may still stump them), fully fine-tuning a small LM or LORA fine-tuning a small LLM show potential to match frontier model performance, suggesting an avenue for specialized embodied agents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Spatial Cognition and Navigation
