Look Where It Matters: Training-Free Ultra-HR Remote Sensing VQA via Adaptive Zoom Search
Yunqi Zhou, Chengjie Jiang, Chun Yuan, Jing Li

TL;DR
ZoomSearch is a training-free, plug-and-play pipeline that enhances Ultra-HR remote sensing visual question answering by localizing relevant image regions through hierarchical search and reorganizing patches for efficient, accurate predictions.
Contribution
It introduces a novel training-free approach combining adaptive zoom search and layout-aware patch reassembly for Ultra-HR RS-VQA, significantly improving accuracy and efficiency.
Findings
Achieves state-of-the-art accuracy on Ultra-HR RS-VQA benchmarks.
Improves inference speed by 20-44% over prior methods.
Enhances model focus on relevant image regions, boosting performance.
Abstract
With advances in satellite constellations, sensor technologies, and imaging pipelines, ultra-high-resolution (Ultra-HR) remote sensing imagery is becoming increasingly widespread. However, current remote sensing foundation models are ill-suited to such inputs: full-image encoding exhausts token and memory budgets, while resize-based preprocessing loses fine-grained and answer-critical details. In this context, guiding the model look where it matters before prediction becomes crucial. Therefore, we present ZoomSearch, a training-free, plug-and-play pipeline that decouples 'where to look' from 'how to answer' for Ultra-HR Remote Sensing Visual Question Answering (RS-VQA). ZoomSearch combines Adaptive Multi-Branch Zoom Search, which performs a hierarchical search over image patches to localize query-relevant regions, with Layout-Aware Patch Reassembly, which reorganizes the selected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
