Decouple and Rectify: Semantics-Preserving Structural Enhancement for Open-Vocabulary Remote Sensing Segmentation

Jie Feng; Fengze Li; Junpeng Zhang; Siyu Chen; Yuping Liang; Junying Chen; Ronghua Shang

arXiv:2604.02010·cs.CV·April 3, 2026

Decouple and Rectify: Semantics-Preserving Structural Enhancement for Open-Vocabulary Remote Sensing Segmentation

Jie Feng, Fengze Li, Junpeng Zhang, Siyu Chen, Yuping Liang, Junying Chen, Ronghua Shang

PDF

TL;DR

This paper introduces DR-Seg, a framework that decouples CLIP features into semantic and structural components, enabling targeted enhancement for open-vocabulary remote sensing segmentation.

Contribution

It proposes a novel decouple-and-rectify approach that preserves semantics while improving structural delineation in remote sensing segmentation tasks.

Findings

01

DR-Seg achieves state-of-the-art results across eight benchmarks.

02

Decoupling CLIP features improves boundary delineation without semantic disruption.

03

Graph rectification with structural priors enhances segmentation accuracy.

Abstract

Open-vocabulary semantic segmentation in the remote sensing (RS) field requires both language-aligned recognition and fine-grained spatial delineation. Although CLIP offers robust semantic generalization, its global-aligned visual representations inherently struggle to capture structural details. Recent methods attempt to compensate for this by introducing RS-pretrained DINO features. However, these methods treat CLIP representations as a monolithic semantic space and cannot localize where structural enhancement is required, failing to effectively delineate boundaries while risking the disruption of CLIP's semantic integrity. To address this limitation, we propose DR-Seg, a novel decouple-and-rectify framework in this paper. Our method is motivated by the key observation that CLIP feature channels exhibit distinct functional heterogeneity rather than forming a uniform semantic space.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.