SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images

Zepeng Xin; Kaiyu Li; Luodi Chen; Wanchen Li; Yuchen Xiao; Hui Qiao; Weizhan Zhang; Deyu Meng; Xiangyong Cao

arXiv:2512.20013·cs.CV·December 24, 2025

SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images

Zepeng Xin, Kaiyu Li, Luodi Chen, Wanchen Li, Yuchen Xiao, Hui Qiao, Weizhan Zhang, Deyu Meng, Xiangyong Cao

PDF

Open Access

TL;DR

This paper introduces LaSeRS, a large-scale dataset for complex language-guided segmentation in remote sensing images, and proposes SegEarth-R2, a new model architecture that effectively handles multi-target, hierarchical, and reasoning-based segmentation tasks.

Contribution

The paper presents LaSeRS, a comprehensive dataset for complex geospatial segmentation, and SegEarth-R2, a novel model architecture with spatial attention and flexible query mechanisms for improved performance.

Findings

01

SegEarth-R2 outperforms existing models on LaSeRS and other benchmarks.

02

The spatial attention supervision improves localization of small objects.

03

The flexible segmentation query mechanism handles multi-target scenarios effectively.

Abstract

Effectively grounding complex language to pixels in remote sensing (RS) images is a critical challenge for applications like disaster response and environmental monitoring. Current models can parse simple, single-target commands but fail when presented with complex geospatial scenarios, e.g., segmenting objects at various granularities, executing multi-target instructions, and interpreting implicit user intent. To drive progress against these failures, we present LaSeRS, the first large-scale dataset built for comprehensive training and evaluation across four critical dimensions of language-guided segmentation: hierarchical granularity, target multiplicity, reasoning requirements, and linguistic variability. By capturing these dimensions, LaSeRS moves beyond simple commands, providing a benchmark for complex geospatial reasoning. This addresses a critical gap: existing datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning