When More Is Less: A Systematic Analysis of Spatial and Commonsense Information for Visual Spatial Reasoning

Muku Akasaka; Soyeon Caren Han

arXiv:2602.21619·cs.CL·February 26, 2026

When More Is Less: A Systematic Analysis of Spatial and Commonsense Information for Visual Spatial Reasoning

Muku Akasaka, Soyeon Caren Han

PDF

Open Access

TL;DR

This paper systematically analyzes how different types and amounts of spatial and commonsense information affect visual spatial reasoning in vision-language models, revealing that more information often does not improve performance.

Contribution

It provides a detailed, hypothesis-driven analysis of information injection strategies in VSR, offering practical insights into when and how such information enhances reasoning.

Findings

01

Single spatial cues outperform multi-context aggregation.

02

Excessive or irrelevant commonsense knowledge degrades performance.

03

CoT prompting improves accuracy only with precise spatial grounding.

Abstract

Visual spatial reasoning (VSR) remains challenging for modern vision-language models (VLMs), despite advances in multimodal architectures. A common strategy is to inject additional information at inference time, such as explicit spatial cues, external commonsense knowledge, or chain-of-thought (CoT) reasoning instructions. However, it remains unclear when such information genuinely improves reasoning and when it introduces noise. In this paper, we conduct a hypothesis-driven analysis of information injection for VSR across three representative VLMs and two public benchmarks. We examine (i) the type and number of spatial contexts, (ii) the amount and relevance of injected commonsense knowledge, and (iii) the interaction between spatial grounding and CoT prompting. Our results reveal a consistent pattern: more information does not necessarily yield better reasoning. Targeted single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Spatial Cognition and Navigation · Constraint Satisfaction and Optimization