VIEW2SPACE: Studying Multi-View Visual Reasoning from Sparse Observations

Fucai Ke; Zhixi Cai; Boying Li; Long Chen; Beibei Lin; Weiqing Wang; Pari Delir Haghighi; Gholamreza Haffari; Hamid Rezatofighi

arXiv:2603.16506·cs.CV·March 19, 2026

VIEW2SPACE: Studying Multi-View Visual Reasoning from Sparse Observations

Fucai Ke, Zhixi Cai, Boying Li, Long Chen, Beibei Lin, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, Hamid Rezatofighi

PDF

Open Access

TL;DR

This paper introduces VIEW2SPACE, a new benchmark for multi-view visual reasoning using simulated 3D scenes, revealing current models' limitations and proposing methods to improve reasoning across sparse views.

Contribution

The paper presents a scalable simulation-based benchmark for multi-view reasoning and evaluates state-of-the-art models, highlighting the challenges and proposing grounded chain-of-thought methods for improvement.

Findings

01

Multi-view reasoning models perform only marginally better than random.

02

Grounded Chain-of-Thought improves performance on moderate difficulty questions.

03

Scaling models benefits geometric perception but not deep reasoning across sparse views.

Abstract

Multi-view visual reasoning is essential for intelligent systems that must understand complex environments from sparse and discrete viewpoints, yet existing research has largely focused on single-image or temporally dense video settings. In real-world scenarios, reasoning across views requires integrating partial observations without explicit guidance, while collecting large-scale multi-view data with accurate geometric and semantic annotations remains challenging. To address this gap, we leverage physically grounded simulation to construct diverse, high-fidelity 3D scenes with precise per-view metadata, enabling scalable data generation that remains transferable to real-world settings. Based on this engine, we introduce VIEW2SPACE, a multi-dimensional benchmark for sparse multi-view reasoning, together with a scalable, disjoint training split supporting millions of grounded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Neural Network Applications