SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation

Shiqi Huang; Shuting He; Huaiyuan Qin; Bihan Wen

arXiv:2507.12857·cs.CV·July 30, 2025

SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation

Shiqi Huang, Shuting He, Huaiyuan Qin, Bihan Wen

PDF

TL;DR

SCORE introduces a novel framework that leverages scene context at multiple levels to improve open-vocabulary remote sensing instance segmentation, enabling better recognition of diverse and novel Earth observation objects.

Contribution

It proposes multi-granularity scene context integration, including regional and global context, to enhance open-vocabulary segmentation in remote sensing images.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Establishes new benchmarks for open-vocabulary remote sensing segmentation.

03

Demonstrates robustness across diverse Earth observation scenarios.

Abstract

Most existing remote sensing instance segmentation approaches are designed for close-vocabulary prediction, limiting their ability to recognize novel categories or generalize across datasets. This restricts their applicability in diverse Earth observation scenarios. To address this, we introduce open-vocabulary (OV) learning for remote sensing instance segmentation. While current OV segmentation models perform well on natural image datasets, their direct application to remote sensing faces challenges such as diverse landscapes, seasonal variations, and the presence of small or ambiguous objects in aerial imagery. To overcome these challenges, we propose $SCORE$ ( $S$ cene $C$ ontext matters in $O$ pen-vocabulary $RE$ mote sensing instance segmentation), a framework that integrates multi-granularity scene context, i.e., regional context and global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.