PriorCLIP: Visual Prior Guided Vision-Language Model for Remote Sensing Image-Text Retrieval

Jiancheng Pan; Muyuan Ma; Qing Ma; Cong Bai; Shengyong Chen

arXiv:2405.10160·cs.CV·September 11, 2025·1 cites

PriorCLIP: Visual Prior Guided Vision-Language Model for Remote Sensing Image-Text Retrieval

Jiancheng Pan, Muyuan Ma, Qing Ma, Cong Bai, Shengyong Chen

PDF

Open Access 1 Repo

TL;DR

PriorCLIP introduces visual priors and a progressive attention framework to improve remote sensing image-text retrieval across closed and open domains, addressing semantic noise and domain shifts.

Contribution

It proposes a novel visual prior-guided vision-language model with progressive attention encoders and a two-stage learning strategy for robust retrieval.

Findings

01

Outperforms existing methods by 4.9% and 4.0% in closed-domain retrieval.

02

Achieves 7.3% and 9.4% improvements in open-domain retrieval.

03

Demonstrates effectiveness on RSICD and RSITMD benchmarks.

Abstract

Remote sensing image-text retrieval plays a crucial role in remote sensing interpretation, yet remains challenging under both closed-domain and open-domain scenarios due to semantic noise and domain shifts. To address these issues, we propose a visual prior-guided vision-language model, PriorCLIP, which leverages visual priors for unbiased representation learning and adaptive vision-language alignment. In the closed-domain setting, PriorCLIP introduces two Progressive Attention Encoder (PAE) structures: Spatial-PAE constructs a belief matrix with instruction embeddings to filter key features and mitigate semantic bias. At the same time, Temporal-PAE exploits cyclic activation across time steps to enhance text representation. For the open-domain setting, we design a two-stage prior representation learning strategy, consisting of large-scale pre-training on coarse-grained image-text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jaychempan/pir-clip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques