Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning
Ivica Dimitrovski, Vlatko Spasev, Ivan Kitanovski

TL;DR
This paper explores prompt learning strategies to adapt CLIP for few-shot remote sensing image scene classification, addressing domain gaps and improving performance with minimal labeled data.
Contribution
It systematically evaluates various prompt learning methods for remote sensing, demonstrating their effectiveness over standard baselines in few-shot and cross-domain scenarios.
Findings
Prompt learning outperforms zero-shot CLIP and linear probes in few-shot settings.
Self-Regulating Constraints enhance cross-domain robustness.
Prompting methods improve adaptation to diverse remote sensing datasets.
Abstract
Remote sensing applications increasingly rely on deep learning for scene classification. However, their performance is often constrained by the scarcity of labeled data and the high cost of annotation across diverse geographic and sensor domains. While recent vision-language models like CLIP have shown promise by learning transferable representations at scale by aligning visual and textual modalities, their direct application to remote sensing remains suboptimal due to significant domain gaps and the need for task-specific semantic adaptation. To address this critical challenge, we systematically explore prompt learning as a lightweight and efficient adaptation strategy for few-shot remote sensing image scene classification. We evaluate several representative methods, including Context Optimization, Conditional Context Optimization, Multi-modal Prompt Learning, and Prompting with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
