Vision-Language Model Purified Semi-Supervised Semantic Segmentation for Remote Sensing Images
Shanwen Wang, Xin Sun, Danfeng Hong, Fei Zhou

TL;DR
SemiEarth introduces a vision-language model-based pseudo-label purification method to enhance semi-supervised semantic segmentation in remote sensing images, significantly improving label quality and model performance.
Contribution
The paper proposes VLM pseudo-label purifying (VLM-PP) to improve pseudo-label quality in semi-supervised remote sensing segmentation, independent of existing architectures.
Findings
Achieves state-of-the-art results on multiple RS datasets.
Significantly improves pseudo-label accuracy, especially at object boundaries.
Provides enhanced interpretability over previous methods.
Abstract
The semi-supervised semantic segmentation (S4) can learn rich visual knowledge from low-cost unlabeled images. However, traditional S4 architectures all face the challenge of low-quality pseudo-labels, especially for the teacher-student framework.We propose a novel SemiEarth model that introduces vision-language models (VLMs) to address the S4 issues for the remote sensing (RS) domain. Specifically, we invent a VLM pseudo-label purifying (VLM-PP) structure to purify the teacher network's pseudo-labels, achieving substantial improvements. Especially in multi-class boundary regions of RS images, the VLM-PP module can significantly improve the quality of pseudo-labels generated by the teacher, thereby correctly guiding the student model's learning. Moreover, since VLM-PP equips VLMs with open-world capabilities and is independent of the S4 architecture, it can correct mispredicted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Remote-Sensing Image Classification · Domain Adaptation and Few-Shot Learning
