ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks
Ruixun Liu, Bowen Fu, Jiayi Song, Kaiyu Li, Wanchen Li, Lanxuan Xue, Hui Qiao, Weizhan Zhang, Deyu Meng, Xiangyong Cao

TL;DR
ZoomEarth introduces an active perception framework for ultra-high-resolution remote sensing images, enabling models to selectively revisit regions, improving task performance and versatility across various geospatial vision-language tasks.
Contribution
The paper presents ZoomEarth, a novel adaptive cropping-zooming framework with a new Region-Guided reward, and a large-scale UHR remote sensing benchmark dataset, LRS-GRO, advancing active perception in this domain.
Findings
ZoomEarth achieves state-of-the-art results on LRS-GRO and public benchmarks.
The framework effectively guides models to focus on information-rich regions.
ZoomEarth demonstrates strong versatility across downstream geospatial tasks.
Abstract
Ultra-high-resolution (UHR) remote sensing (RS) images offer rich fine-grained information but also present challenges in effective processing. Existing dynamic resolution and token pruning methods are constrained by a passive perception paradigm, suffering from increased redundancy when obtaining finer visual inputs. In this work, we explore a new active perception paradigm that enables models to revisit information-rich regions. First, we present LRS-GRO, a large-scale benchmark dataset tailored for active perception in UHR RS processing, encompassing 17 question types across global, region, and object levels, annotated via a semi-automatic pipeline. Building on LRS-GRO, we propose ZoomEarth, an adaptive cropping-zooming framework with a novel Region-Guided reward that provides fine-grained guidance. Trained via supervised fine-tuning (SFT) and Group Relative Policy Optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Advanced Neural Network Applications · Advanced Image Processing Techniques
