GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
Yue Zhou, Mengcheng Lan, Xiang Li, Litong Feng, Yiping Ke, Xue Jiang, Qingyun Li, Xue Yang, Wayne Zhang

TL;DR
GeoGround is a unified large vision-language model that effectively handles diverse remote sensing visual grounding tasks, including bounding boxes, oriented boxes, and segmentation masks, by supporting flexible outputs and leveraging prompt-assisted and geometry-guided learning.
Contribution
It introduces a novel framework that unifies multiple RS visual grounding tasks within a single model without customizing architecture, supporting dense prediction outputs via the Text-Mask technique.
Findings
Strong performance across four RS visual grounding tasks
Matches specialized methods on multiple benchmarks
Supports flexible output types including masks and bounding boxes
Abstract
Remote sensing (RS) visual grounding aims to use natural language expression to locate specific objects (in the form of the bounding box or segmentation mask) in RS images, enhancing human interaction with intelligent RS interpretation systems. Early research in this area was primarily based on horizontal bounding boxes (HBBs), but as more diverse RS datasets have become available, tasks involving oriented bounding boxes (OBBs) and segmentation masks have emerged. In practical applications, different targets require different grounding types: HBB can localize an object's position, OBB provides its orientation, and mask depicts its shape. However, existing specialized methods are typically tailored to a single type of RS visual grounding task and are hard to generalize across tasks. In contrast, large vision-language models (VLMs) exhibit powerful multi-task learning capabilities but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
