Stuck in the Matrix: Probing Spatial Reasoning in Large Language Models
Maggie Bai, Ava Kim Cohen, Eleanor Koss, Charlie Lichtenbaum

TL;DR
This study evaluates large language models' spatial reasoning abilities across various tasks, revealing significant performance decline with increased complexity, highlighting limitations in their spatial understanding and the need for improved models.
Contribution
The paper introduces a comprehensive suite of spatial reasoning tasks for LLMs and systematically analyzes their performance, exposing current limitations and guiding future research.
Findings
Performance drops by 42.7% on average as complexity increases.
Models struggle with multi-step spatial reasoning beyond simple pattern recognition.
Scaling complexity causes accuracy to decline as much as 84%.
Abstract
This paper explores the spatial reasoning capability of large language models (LLMs) over textual input through a suite of five tasks aimed at probing their spatial understanding and computational abilities. The models were tested on both fundamental spatial reasoning and multi-step problem-solving within structured grid-based environments using tasks such as quadrant identification, geometric transformations, distance evaluation, word searches, and tile sliding. Each task was scaled in complexity through increasing grid dimensions, requiring models to extend beyond simple pattern recognition into abstract spatial reasoning. Our results reveal that while LLMs demonstrate moderate success in all tasks with small complexity and size, performance drops off rapidly as scale increases, with an average loss in accuracy of 42.7%, and reaching as high as 84%. Every test that began with over 50%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial Cognition and Navigation · Constraint Satisfaction and Optimization · Multimodal Machine Learning Applications
