MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

Yuncong Yang; Jiageng Liu; Zheyuan Zhang; Siyuan Zhou; Reuben Tan; Jianwei Yang; Yilun Du; Chuang Gan

arXiv:2507.12508·cs.CV·November 4, 2025

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

Yuncong Yang, Jiageng Liu, Zheyuan Zhang, Siyuan Zhou, Reuben Tan, Jianwei Yang, Yilun Du, Chuang Gan

PDF

Open Access 1 Models 2 Videos

TL;DR

MindJourney introduces a test-time scaling framework that enhances vision-language models with a controllable world model for improved 3D spatial reasoning without fine-tuning, demonstrated by performance gains on the SAT benchmark.

Contribution

The paper presents a novel test-time scaling approach coupling VLMs with a video diffusion-based world model for robust 3D spatial reasoning.

Findings

01

Achieves over 7.7% performance boost on SAT benchmark.

02

Improves VLM reasoning without fine-tuning.

03

Demonstrates effectiveness of world models for test-time scaling.

Abstract

Spatial reasoning in 3D space is central to human cognition and indispensable for embodied tasks such as navigation and manipulation. However, state-of-the-art vision-language models (VLMs) struggle frequently with tasks as simple as anticipating how a scene will look after an egocentric motion: they perceive 2D images but lack an internal model of 3D dynamics. We therefore propose MindJourney, a test-time scaling framework that grants a VLM with this missing capability by coupling it to a controllable world model based on video diffusion. The VLM iteratively sketches a concise camera trajectory, while the world model synthesizes the corresponding view at each step. The VLM then reasons over this multi-view evidence gathered during the interactive exploration. Without any fine-tuning, our MindJourney achieves over an average 7.7% performance boost on the representative spatial reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
yyuncong/MindJourney-World-Model
model· 29 dl· ♡ 2
29 dl♡ 2

Videos

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning· youtube

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning· slideslive

Taxonomy

TopicsConstraint Satisfaction and Optimization · Semantic Web and Ontologies · Data Management and Algorithms