DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning
Lachlan McPheat, Navdeep Kaur, Robert Blackwell, Alessandra Russo, Anthony G. Cohn, Pranava Madhyastha

TL;DR
DecompSR is a large, procedurally generated benchmark dataset designed to analyze and evaluate the compositional spatial reasoning abilities of large language models, with independent control over various aspects of compositionality.
Contribution
The paper introduces DecompSR, a correct-by-construction dataset that allows independent variation of compositionality factors for probing LLM spatial reasoning.
Findings
LLMs struggle with productive and systematic generalization in spatial reasoning.
LLMs are more robust to linguistic variation.
DecompSR enables fine-grained analysis of reasoning abilities.
Abstract
We introduce DecompSR, decomposed spatial reasoning, a large benchmark dataset (over 5m datapoints) and generation framework designed to analyse compositional spatial reasoning ability. The generation of DecompSR allows users to independently vary several aspects of compositionality, namely: productivity (reasoning depth), substitutivity (entity and linguistic variability), overgeneralisation (input order, distractors) and systematicity (novel linguistic elements). DecompSR is built procedurally in a manner which makes it is correct by construction, which is independently verified using a symbolic solver to guarantee the correctness of the dataset. DecompSR is comprehensively benchmarked across a host of Large Language Models (LLMs) where we show that LLMs struggle with productive and systematic generalisation in spatial reasoning tasks whereas they are more robust to linguistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
