GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning
Zhisheng Tang, Mayank Kejriwal

TL;DR
This paper introduces GRASP, a large-scale grid-based benchmark for evaluating commonsense spatial reasoning in AI, revealing current models' struggles with planning in spatial tasks.
Contribution
The paper presents GRASP, a comprehensive benchmark with 16,000 environments for assessing spatial reasoning, and compares baseline algorithms with advanced LLMs, highlighting their limitations.
Findings
LLMs struggle to consistently solve spatial reasoning tasks
Classic algorithms like greedy search perform variably
GRASP provides a challenging testbed for future models
Abstract
Spatial reasoning, an important faculty of human cognition with many practical applications, is one of the core commonsense skills that is not purely language-based and, for satisfying (as opposed to optimal) solutions, requires some minimum degree of planning. Existing benchmarks of Commonsense Spatial Reasoning (CSR) tend to evaluate how Large Language Models (LLMs) interpret text-based spatial rather than directly evaluate a plan produced by the LLM in response to a spatial reasoning problem. In this paper, we construct a large-scale benchmark called GRASP, which consists of 16,000 grid-based environments where the agent is tasked with an energy collection problem. These environments include 100 grid instances instantiated using each of the 160 different grid settings, involving five different energy distributions, two modes of agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Constraint Satisfaction and Optimization · Data Management and Algorithms
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Dropout · Weight Decay · Residual Connection · Multi-Head Attention
