Think3D: Thinking with Space for Spatial Reasoning

Zaibin Zhang; Yuhan Wu; Lianjie Jia; Yifan Wang; Zhongbo Zhang; Yijiang Li; Binghao Ran; Fuxi Zhang; Zhuohan Sun; Zhenfei Yin; Lijun Wang; Huchuan Lu

arXiv:2601.13029·cs.CV·March 18, 2026

Think3D: Thinking with Space for Spatial Reasoning

Zaibin Zhang, Yuhan Wu, Lianjie Jia, Yifan Wang, Zhongbo Zhang, Yijiang Li, Binghao Ran, Fuxi Zhang, Zhuohan Sun, Zhenfei Yin, Lijun Wang, Huchuan Lu

PDF

Open Access

TL;DR

Think3D introduces an interactive 3D reasoning framework for vision-language models, significantly enhancing their spatial understanding and reasoning capabilities through tool integration and reinforcement learning.

Contribution

It presents a novel framework that enables active 3D spatial reasoning in VLMs, including a plug-in for large models and a reinforcement learning approach for smaller models.

Findings

01

Performance gains of +7.8% on BLINK Multi-view and MindCube

02

Performance improvement of +4.7% on VSI-Bench

03

Reinforcement learning boosts small model performance from +0.7% to +10.7%

Abstract

While contemporary Vision-Language Models (VLMs) excel at 2D visual understanding, they remain constrained by a passive, 2D-centric paradigm that severely limits genuine 3D spatial reasoning. To bridge this gap, we introduce Think3D, a novel framework that equips VLM agents with interactive, 3D chain-of-thought reasoning capabilities. By integrating a suite of 3D manipulation tools, Think3D transforms passive perception into active spatial exploration, closely mirroring human geometric reasoning. We demonstrate that Think3D acts as a highly effective zero-shot plug-in for state-of-the-art closed-source models (e.g., GPT-4.1, Gemini 2.5 Pro), yielding absolute performance gains of +7.8% on BLINK Multi-view and MindCube, and +4.7% on VSI-Bench. Furthermore, to optimize tool-use in smaller open-weight models, we propose Think3D-RL, a reinforcement learning paradigm designed to autonomously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics