TL;DR
Supervised fine-tuning with minimal labeled satellite images outperforms prompting methods for cloud segmentation under domain shift, emphasizing the importance of labeled data over prompting in specialized remote sensing tasks.
Contribution
This study demonstrates that supervised fine-tuning with small amounts of labeled data surpasses prompting approaches for satellite cloud segmentation, challenging the reliance on prompting in domain-specific applications.
Findings
Prompt variants underperform zero-shot baseline in satellite cloud segmentation.
Supervised fine-tuning with 0.1% labeled data surpasses zero-shot performance.
Fine-tuning outperforms low-rank adaptation, especially for spectrally ambiguous classes.
Abstract
Adapting vision-language models to remote sensing imagery presents a fundamental challenge: both the visual and linguistic distributions of satellite data lie far outside natural image pretraining corpora. Despite this, prompting remains the dominant deployment paradigm, driven by the assumption that domain-specific language can guide frozen model representations toward specialized tasks. We test this assumption directly on a domain where the mismatch is prominent: cloud segmentation for satellite imagery. Using CLIPSeg on the CloudSEN12+ cloud segmentation benchmark, we evaluate 60 prompt variants spanning simple labels, domain terminology, appearance descriptors, and contextual cues, finding that every variant underperforms the zero-shot baseline (0.255 mIoU), with engineered prompts scoring as low as 0.07 mIoU. No amount of linguistic refinement bridges the gap between CLIP's natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
