FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks
Tanawan Premsri, Parisa Kordjamshidi

TL;DR
This paper introduces FoREST, a benchmark for evaluating large language models' understanding of the Frame of Reference in spatial reasoning, revealing performance gaps and proposing a spatial-guided prompting method to enhance comprehension.
Contribution
The paper presents the first dedicated FoR benchmark and a novel prompting method to improve spatial reasoning in LLMs.
Findings
Significant performance gaps across FoR classes in LLMs.
Spatial-Guided prompting improves spatial reasoning accuracy.
FoREST reveals critical shortcomings in current LLM spatial understanding.
Abstract
Spatial reasoning is a fundamental aspect of human intelligence. One key concept in spatial cognition is the Frame of Reference, which identifies the perspective of spatial expressions. Despite its significance, FoR has received limited attention in AI models that need spatial intelligence. There is a lack of dedicated benchmarks and in-depth evaluation of large language models (LLMs) in this area. To address this issue, we introduce the Frame of Reference Evaluation in Spatial Reasoning Tasks (FoREST) benchmark, designed to assess FoR comprehension in LLMs. We evaluate LLMs on answering questions that require FoR comprehension and layout generation in text-to-image models using FoREST. Our results reveal a notable performance gap across different FoR classes in various LLMs, affecting their ability to generate accurate layouts for text-to-image generation. This highlights critical…
Peer Reviews
Decision·Submitted to ICLR 2025
The paper proposes a novel task that can potentially reveal the ability of LLMs in understanding spatial concepts.
1. It is unclear if the proposed Spatial-Guided prompting technique helps “reduce FoR bias in LLMs”, as claimed in the abstract in the paper, or just clarify the category terms that LLMs are tasked to identify. Since the FoR classes (external intrinsic, external relative, etc) are technical terms in cognitive studies that do not appear commonly in the internet data used for training LLMs, a clear and intuitive explanation of the terms is naturally important for solving this task. However the def
- **Novel Perspective**: Introduces an innovative approach to assessing spatial perception in large models, focusing on frames of reference (FoR). - **Theoretical Support**: Draws on established spatial language literature to support the motivations and foundational concepts of FoR. - **Insightful Analysis**: Offers valuable insights into both FoR identification and text-to-image mapping.
No dataset or code provided
1. Spatial ability of LLMs are an important research topic yet less explored. A scientific benchmark would contribute to this area. 2. This work conducts various experiments with a range of LLM models and provide in-depth analysis. It also verifies the proposed prompting method in text-to-image task, adding its value to real world applications. 3. The paper is well organized and written.
1. The dataset is pure synthetic and constructed by a limited number of textual templates. I have concerns about the FoR classification task given the template "<locatum> <spatial relation> <relatum> <perspective>". It seems hard to disentangle this task with linguistic and common-sense reasoning of LLMs. For example, LLMs are able to determine whether the perspective is intrinsic or relative by analyzing perspective template, and analyze topology template to determine whether the locatum is ex
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · AI-based Problem Solving and Planning
MethodsSoftmax · Attention Is All You Need
