GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding

Peirong Zhang; Yidan Zhang; Luxiao Xu; Jinliang Lin; Zonghao Guo; Fengxiang Wang; Xue Yang; Kaiwen Wei; Lei Wang

arXiv:2512.02715·cs.CV·December 3, 2025

GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding

Peirong Zhang, Yidan Zhang, Luxiao Xu, Jinliang Lin, Zonghao Guo, Fengxiang Wang, Xue Yang, Kaiwen Wei, Lei Wang

PDF

Open Access

TL;DR

GeoViS introduces a progressive, reward-guided visual search framework for remote sensing imagery, improving the detection of small targets and understanding complex geospatial relations through iterative exploration and reasoning.

Contribution

It reformulates remote sensing visual grounding as a progressive search process, integrating multimodal perception, spatial reasoning, and reward-guided exploration for enhanced accuracy.

Findings

01

Outperforms existing methods on five benchmarks

02

Achieves precise geospatial understanding

03

Demonstrates strong cross-domain generalization

Abstract

Recent advances in multimodal large language models(MLLMs) have led to remarkable progress in visual grounding, enabling fine-grained cross-modal alignment between textual queries and image regions. However, transferring such capabilities to remote sensing imagery remains challenging, as targets are often extremely small within kilometer-scale scenes, and queries typically involve intricate geospatial relations such as relative positions, spatial hierarchies, or contextual dependencies across distant objects. To address these challenges, we propose GeoViS, a Geospatially Rewarded Visual Search framework that reformulates remote sensing visual grounding as a progressive search-and-reasoning process. Rather than directly predicting the target location in a single step, GeoViS actively explores the global image through a tree-structured sequence of visual cues, integrating multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning