Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
Hailong Ning, Siying Wang, Tao Lei, Xiaopeng Cao, Huanmin Dou, Bin Zhao, Asoke K. Nandi, Petia Radeva

TL;DR
This paper introduces a novel Representation Discrepancy Bridging method for remote sensing image-text retrieval, addressing cross-modal imbalance with asymmetric adapters and dual-task optimization, leading to significant performance improvements.
Contribution
It proposes a Cross-Modal Asymmetric Adapter and a Dual-Task Consistency Loss to enhance feature alignment and robustness in RSITR models, surpassing existing PEFT methods.
Findings
Achieves 6%-11% improvement in mR metrics over state-of-the-art PEFT methods.
Outperforms full fine-tuned GeoRSCLIP by 1.15%-2% in retrieval performance.
Demonstrates effectiveness on RSICD and RSITMD datasets.
Abstract
Remote Sensing Image-Text Retrieval (RSITR) plays a critical role in geographic information interpretation, disaster monitoring, and urban planning by establishing semantic associations between image and textual descriptions. Existing Parameter-Efficient Fine-Tuning (PEFT) methods for Vision-and-Language Pre-training (VLP) models typically adopt symmetric adapter structures for exploring cross-modal correlations. However, the strong discriminative nature of text modality may dominate the optimization process and inhibits image representation learning. The nonnegligible imbalanced cross-modal optimization remains a bottleneck to enhancing the model performance. To address this issue, this study proposes a Representation Discrepancy Bridging (RDB) method for the RSITR task. On the one hand, a Cross-Modal Asymmetric Adapter (CMAA) is designed to enable modality-specific optimization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Remote-Sensing Image Classification
MethodsSoftmax · Attention Is All You Need · ADaptive gradient method with the OPTimal convergence rate · Adapter
