Thinking with Novel Views: A Systematic Analysis of Generative-Augmented Spatial Intelligence

Yanbing Zhang; Bo Wang; Jianhui Liu; Nan Jiang; Jiaxiu Jiang; Haoze Sun; Yijun Yang; Shenghe Zheng; Lin Song; Haoyang Huang; Nan Duan; Wenbo Li

arXiv:2605.10588·cs.CV·May 12, 2026

Thinking with Novel Views: A Systematic Analysis of Generative-Augmented Spatial Intelligence

Yanbing Zhang, Bo Wang, Jianhui Liu, Nan Jiang, Jiaxiu Jiang, Haoze Sun, Yijun Yang, Shenghe Zheng, Lin Song, Haoyang Huang, Nan Duan, Wenbo Li

PDF

TL;DR

This paper introduces TwNV, a paradigm that enhances spatial reasoning in Large Multimodal Models by integrating generative novel-view synthesis, leading to consistent accuracy improvements across various tasks and architectures.

Contribution

The paper presents a systematic approach combining novel-view synthesis with reasoning to improve spatial understanding in LMMs, demonstrating significant accuracy gains.

Findings

01

Numerical camera-pose instructions outperform free-form language for view control.

02

Synthesized view quality directly impacts spatial reasoning accuracy.

03

Iterative multi-turn view refinement further enhances model performance.

Abstract

Current Large Multimodal Models (LMMs) struggle with spatial reasoning tasks requiring viewpoint-dependent understanding, largely because they are confined to a single, static observation. We propose Thinking with Novel Views (TwNV), a paradigm that integrates generative novel-view synthesis into the reasoning loop: a Reasoner LMM identifies spatial ambiguity, instructs a Painter to synthesize an alternative viewpoint, and re-examines the scene with the additional evidence. Through systematic experiments we address three research questions. (1) Instruction format: numerical camera-pose specifications yield more reliable view control than free-form language. (2) Generation fidelity: synthesized view quality is tightly coupled with downstream spatial accuracy. (3) Inference-time visual scaling: iterative multi-turn view refinement further improves performance, echoing recent scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.