GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning

Deshui Miao; Xingsen Huang; Yameng Gu; Xin Li; Haijun Zhang; Ming-Hsuan Yang

arXiv:2605.22558·cs.CV·May 22, 2026

GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning

Deshui Miao, Xingsen Huang, Yameng Gu, Xin Li, Haijun Zhang, Ming-Hsuan Yang

PDF

1 Repo

TL;DR

GeoWeaver introduces a novel geometric grounding framework that adaptively incorporates geometric evidence into visual tokens, significantly improving spatial reasoning in vision-language models.

Contribution

It proposes a token-adaptive geometric evidence allocation method that enhances geometry-aware reasoning by grounding visual tokens with relevant geometric abstractions.

Findings

01

Consistently improves spatial reasoning benchmarks.

02

Retains general multimodal capabilities.

03

Highlights geometric information as a fundamental reasoning prerequisite.

Abstract

Spatio-temporal reasoning in vision-language models requires visual representations that preserve physical geometry rather than merely semantic appearance. Recent multimodal models incorporate geometric information through structural branches, 3D-aware supervision, reasoning-stage fusion, or long-horizon memory. While these approaches demonstrate the importance of geometry for spatial intelligence, they typically treat geometric cues as a shared signal across all visual tokens. We note that this overlooks a finer-grained challenge: different visual tokens require different geometric evidence depending on their spatial roles. To address this limitation, we introduce GeoWeaver, a pre-reasoning geometric grounding framework that treats geometry as a representational prerequisite for spatio-temporal reasoning. GeoWeaver constructs a multi-level geometry bank from a frozen geometry encoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yahooo-m/GeoWeaver
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.