Thinking with Geometry: Active Geometry Integration for Spatial Reasoning

Haoyuan Li; Qihang Cao; Tao Tang; Kun Xiang; Zihan Guo; Jianhua Han; JiaWang Bian; Hang Xu; Xiaodan Liang

arXiv:2602.06037·cs.CV·May 19, 2026

Thinking with Geometry: Active Geometry Integration for Spatial Reasoning

Haoyuan Li, Qihang Cao, Tao Tang, Kun Xiang, Zihan Guo, Jianhua Han, JiaWang Bian, Hang Xu, Xiaodan Liang

PDF

1 Repo 1 Models

TL;DR

GeoThinker introduces an active perception framework for spatial reasoning in multimodal models, enabling selective geometric evidence retrieval to improve spatial understanding and generalization.

Contribution

It proposes GeoThinker, a novel active perception approach that enhances spatial reasoning by selectively integrating geometry conditioned on reasoning demands.

Findings

01

Achieves a new state-of-the-art score of 72.6 on VSI-Bench.

02

Demonstrates improved spatial perception in embodied referring and autonomous driving.

03

Shows robust generalization across complex downstream scenarios.

Abstract

Recent progress in spatial reasoning with Multimodal Large Language Models (MLLMs) increasingly leverages geometric priors from 3D encoders. However, most existing integration strategies remain passive: geometry is exposed as a global stream and fused in an indiscriminate manner, which often induces semantic-geometry misalignment and redundant signals. We propose GeoThinker, a framework that shifts the paradigm from passive fusion to active perception. Instead of feature mixing, GeoThinker enables the model to selectively retrieve geometric evidence conditioned on its internal reasoning demands. GeoThinker achieves this through Spatial-Grounded Fusion applied at carefully selected VLM layers, where semantic visual priors selectively query and integrate task-relevant geometry via frame-strict cross-attention, further calibrated by Importance Gating that biases per-frame attention toward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Li-Hao-yuan/GeoThinker
github

Models

🤗
lihy285/GeoThinker
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.