Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling

Saurav Jha; M. Jehanzeb Mirza; Wei Lin; Shiqi Yang; Sarath Chandar

arXiv:2512.05809·cs.CV·December 8, 2025

Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling

Saurav Jha, M. Jehanzeb Mirza, Wei Lin, Shiqi Yang, Sarath Chandar

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of test-time verification methods in world-model-based spatial reasoning, introduces a new verification framework, and analyzes their strengths and limitations across benchmarks.

Contribution

It systematically analyzes test-time verifiers in world models, introduces ViSA for improved verification, and highlights their varying success across different spatial reasoning benchmarks.

Findings

01

Verifiers often provide unreliable calibration and answer entropy reduction.

02

ViSA improves spatial reasoning on SAT-Real by grounding rewards in verifiable claims.

03

Current world models face an information bottleneck, limiting scaling on complex benchmarks.

Abstract

Vision-Language Models (VLMs) remain limited in spatial reasoning tasks that require multi-view understanding and embodied perspective shifts. Recent approaches such as MindJourney attempt to mitigate this gap through test-time scaling where a world model imagines action-conditioned trajectories and a heuristic verifier selects helpful views from such trajectories. In this work, we systematically examine how such test-time verifiers behave across benchmarks, uncovering both their promise and their pitfalls. Our uncertainty-based analyses show that MindJourney's verifier provides little meaningful calibration, and that random scoring often reduces answer entropy equally well, thus exposing systematic action biases and unreliable reward signals. To mitigate these, we introduce a Verification through Spatial Assertions (ViSA) framework that grounds the test-time reward in verifiable,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Spatial Cognition and Navigation