SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images
Zepeng Xin, Kaiyu Li, Luodi Chen, Wanchen Li, Yuchen Xiao, Hui Qiao, Weizhan Zhang, Deyu Meng, Xiangyong Cao

TL;DR
This paper introduces LaSeRS, a large-scale dataset for complex language-guided segmentation in remote sensing images, and proposes SegEarth-R2, a new model architecture that effectively handles multi-target, hierarchical, and reasoning-based segmentation tasks.
Contribution
The paper presents LaSeRS, a comprehensive dataset for complex geospatial segmentation, and SegEarth-R2, a novel model architecture with spatial attention and flexible query mechanisms for improved performance.
Findings
SegEarth-R2 outperforms existing models on LaSeRS and other benchmarks.
The spatial attention supervision improves localization of small objects.
The flexible segmentation query mechanism handles multi-target scenarios effectively.
Abstract
Effectively grounding complex language to pixels in remote sensing (RS) images is a critical challenge for applications like disaster response and environmental monitoring. Current models can parse simple, single-target commands but fail when presented with complex geospatial scenarios, e.g., segmenting objects at various granularities, executing multi-target instructions, and interpreting implicit user intent. To drive progress against these failures, we present LaSeRS, the first large-scale dataset built for comprehensive training and evaluation across four critical dimensions of language-guided segmentation: hierarchical granularity, target multiplicity, reasoning requirements, and linguistic variability. By capturing these dimensions, LaSeRS moves beyond simple commands, providing a benchmark for complex geospatial reasoning. This addresses a critical gap: existing datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
