FarSLIP: Discovering Effective CLIP Adaptation for Fine-Grained Remote Sensing Understanding
Zhenshi Li, Weikang Yu, Dilxat Muhtar, Xueliang Zhang, Pengfeng Xiao, Pedram Ghamisi, Xiao Xiang Zhu

TL;DR
FarSLIP introduces a novel fine-grained RS image-text dataset and a new CLIP adaptation method that enhances spatial awareness and semantic coherence, significantly improving performance on RS vision-language tasks.
Contribution
It constructs the first multi-granularity RS dataset with object-level supervision and proposes FarSLIP, a new fine-grained CLIP tuning framework utilizing patch-to-patch distillation and CLS token-based region-category alignment.
Findings
Sets new state-of-the-art on RS semantic segmentation
Improves zero-shot classification accuracy in RS domain
Enhances image-text retrieval performance
Abstract
As CLIP's global alignment limits its ability to capture fine-grained details, recent efforts have focused on enhancing its region-text alignment. However, current remote sensing (RS)-specific CLIP variants still inherit this limited spatial awareness. We identify two key limitations behind this: (1) current RS image-text datasets generate global captions from object-level labels, leaving the original object-level supervision underutilized; (2) despite the success of region-text alignment methods in general domain, their direct application to RS data often leads to performance degradation. To address these, we construct the first multi-granularity RS image-text dataset, MGRS-200k, featuring rich object-level textual supervision for RS region-category alignment. We further investigate existing fine-grained CLIP tuning strategies and find that current explicit region-text alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Domain Adaptation and Few-Shot Learning · Geographic Information Systems Studies
