DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning

Lachlan McPheat; Navdeep Kaur; Robert Blackwell; Alessandra Russo; Anthony G. Cohn; Pranava Madhyastha

arXiv:2511.02627·cs.AI·April 15, 2026

DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning

Lachlan McPheat, Navdeep Kaur, Robert Blackwell, Alessandra Russo, Anthony G. Cohn, Pranava Madhyastha

PDF

TL;DR

DecompSR is a large, procedurally generated benchmark dataset designed to analyze and evaluate the compositional spatial reasoning abilities of large language models, with independent control over various aspects of compositionality.

Contribution

The paper introduces DecompSR, a correct-by-construction dataset that allows independent variation of compositionality factors for probing LLM spatial reasoning.

Findings

01

LLMs struggle with productive and systematic generalization in spatial reasoning.

02

LLMs are more robust to linguistic variation.

03

DecompSR enables fine-grained analysis of reasoning abilities.

Abstract

We introduce DecompSR, decomposed spatial reasoning, a large benchmark dataset (over 5m datapoints) and generation framework designed to analyse compositional spatial reasoning ability. The generation of DecompSR allows users to independently vary several aspects of compositionality, namely: productivity (reasoning depth), substitutivity (entity and linguistic variability), overgeneralisation (input order, distractors) and systematicity (novel linguistic elements). DecompSR is built procedurally in a manner which makes it is correct by construction, which is independently verified using a symbolic solver to guarantee the correctness of the dataset. DecompSR is comprehensively benchmarked across a host of Large Language Models (LLMs) where we show that LLMs struggle with productive and systematic generalisation in spatial reasoning tasks whereas they are more robust to linguistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.