SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
Kaiyu Li, Ruixun Liu, Xiangyong Cao, Xueru Bai, Feng Zhou, Deyu Meng,, Zhi Wang

TL;DR
This paper introduces a training-free open-vocabulary segmentation method for remote sensing images, utilizing a novel upsampler and token bias correction, achieving significant improvements across multiple datasets.
Contribution
Proposes SimFeatUp, a training-free upsampler, and a token subtraction technique to enhance open-vocabulary segmentation in remote sensing images, addressing low-resolution and boundary issues.
Findings
Achieves up to 15.3% improvement over state-of-the-art methods.
Effective across diverse remote sensing tasks and datasets.
Introduces training-free techniques suitable for practical applications.
Abstract
Remote sensing image plays an irreplaceable role in fields such as agriculture, water resources, military, and disaster relief. Pixel-level interpretation is a critical aspect of remote sensing image applications; however, a prevalent limitation remains the need for extensive manual annotation. For this, we try to introduce open-vocabulary semantic segmentation (OVSS) into the remote sensing context. However, due to the sensitivity of remote sensing images to low-resolution features, distorted target shapes and ill-fitting boundaries are exhibited in the prediction mask. To tackle this issue, we propose a simple and general upsampler, SimFeatUp, to restore lost spatial information in deep features in a training-free style. Further, based on the observation of the abnormal response of local patch tokens to [CLS] token in CLIP, we propose to execute a straightforward subtraction operation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
MethodsContrastive Language-Image Pre-training
