Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective

Qiyao Xue; Weichen Liu; Shiqi Wang; Haoming Wang; Yuyang Wu; Wei Gao

arXiv:2512.02340·cs.AI·December 3, 2025

Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective

Qiyao Xue, Weichen Liu, Shiqi Wang, Haoming Wang, Yuyang Wu, Wei Gao

PDF

1 Datasets

TL;DR

This paper introduces ReMindView-Bench, a new benchmark for evaluating multi-view spatial reasoning in vision-language models, revealing their challenges in cross-view alignment and perspective-taking, and providing insights into their reasoning process.

Contribution

It presents a cognitively grounded benchmark and comprehensive analysis methods to diagnose and understand the limitations of current VLMs in multi-view spatial reasoning.

Findings

01

VLMs struggle with cross-view alignment and perspective-taking.

02

Performance drops significantly when integrating information across views.

03

Analysis reveals progressive loss of task-relevant information during reasoning.

Abstract

Spatial reasoning is a core aspect of human intelligence that allows perception, inference and planning in 3D environments. However, current vision-language models (VLMs) struggle to maintain geometric coherence and cross-view consistency for spatial reasoning in multi-view settings. We attribute this gap to the lack of fine-grained benchmarks that isolate multi-view reasoning from single-view perception and temporal factors. To address this, we present ReMindView-Bench, a cognitively grounded benchmark for evaluating how VLMs construct, align and maintain spatial mental models across complementary viewpoints. ReMindView-Bench systematically varies viewpoint spatial pattern and query type to probe key factors of spatial cognition. Evaluations of 15 current VLMs reveals consistent failures in cross-view alignment and perspective-taking in multi-view spatial reasoning, motivating deeper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Xue0823/ReMindView-Bench
dataset· 194 dl
194 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.