PriorCLIP: Visual Prior Guided Vision-Language Model for Remote Sensing Image-Text Retrieval
Jiancheng Pan, Muyuan Ma, Qing Ma, Cong Bai, Shengyong Chen

TL;DR
PriorCLIP introduces visual priors and a progressive attention framework to improve remote sensing image-text retrieval across closed and open domains, addressing semantic noise and domain shifts.
Contribution
It proposes a novel visual prior-guided vision-language model with progressive attention encoders and a two-stage learning strategy for robust retrieval.
Findings
Outperforms existing methods by 4.9% and 4.0% in closed-domain retrieval.
Achieves 7.3% and 9.4% improvements in open-domain retrieval.
Demonstrates effectiveness on RSICD and RSITMD benchmarks.
Abstract
Remote sensing image-text retrieval plays a crucial role in remote sensing interpretation, yet remains challenging under both closed-domain and open-domain scenarios due to semantic noise and domain shifts. To address these issues, we propose a visual prior-guided vision-language model, PriorCLIP, which leverages visual priors for unbiased representation learning and adaptive vision-language alignment. In the closed-domain setting, PriorCLIP introduces two Progressive Attention Encoder (PAE) structures: Spatial-PAE constructs a belief matrix with instruction embeddings to filter key features and mitigate semantic bias. At the same time, Temporal-PAE exploits cyclic activation across time steps to enhance text representation. For the open-domain setting, we design a two-stage prior representation learning strategy, consisting of large-scale pre-training on coarse-grained image-text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
