Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning
Fangjun Li, David C. Hogg, Anthony G. Cohn

TL;DR
This paper introduces a realistic 3D simulation benchmark for evaluating qualitative spatial reasoning in language models, addressing limitations of previous simplified tests and highlighting current models' challenges with complex spatial tasks.
Contribution
The paper presents a novel, simulation-based benchmark for qualitative spatial reasoning, including a logic-based consistency tool and diverse real-world scenarios for more effective evaluation.
Findings
Advanced LMs struggle with multi-hop spatial reasoning.
Models have difficulty interpreting mixed view descriptions.
The benchmark reveals specific strengths and limitations of current LMs.
Abstract
Spatial reasoning plays a vital role in both human cognition and machine intelligence, prompting new research into language models' (LMs) capabilities in this regard. However, existing benchmarks reveal shortcomings in evaluating qualitative spatial reasoning (QSR). These benchmarks typically present oversimplified scenarios or unclear natural language descriptions, hindering effective evaluation. We present a novel benchmark for assessing QSR in LMs, which is grounded in realistic 3D simulation data, offering a series of diverse room layouts with various objects and their spatial relationships. This approach provides a more detailed and context-rich narrative for spatial reasoning evaluation, diverging from traditional, toy-task-oriented scenarios. Our benchmark encompasses a broad spectrum of qualitative spatial relationships, including topological, directional, and distance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Semantic Web and Ontologies
