Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
Yuan Yuan, Yang Zhan, and Zhitong Xiong

TL;DR
This paper introduces a parameter-efficient transfer learning framework for remote sensing image-text retrieval, reducing training costs significantly while maintaining or improving performance compared to traditional fine-tuning methods.
Contribution
It proposes a novel PETL framework with a remote sensing adapter and a hybrid contrastive loss, demonstrating effective knowledge transfer with minimal parameters.
Findings
Achieves 98.9% parameter reduction compared to full fine-tuning.
Outperforms traditional methods by 7-13% in retrieval accuracy.
Maintains or exceeds performance of full fine-tuning with much fewer parameters.
Abstract
Vision-and-language pre-training (VLP) models have experienced a surge in popularity recently. By fine-tuning them on specific datasets, significant performance improvements have been observed in various tasks. However, full fine-tuning of VLP models not only consumes a significant amount of computational resources but also has a significant environmental impact. Moreover, as remote sensing (RS) data is constantly being updated, full fine-tuning may not be practical for real-world applications. To address this issue, in this work, we investigate the parameter-efficient transfer learning (PETL) method to effectively and efficiently transfer visual-language knowledge from the natural domain to the RS domain on the image-text retrieval task. To this end, we make the following contributions. 1) We construct a novel and sophisticated PETL framework for the RS image-text retrieval (RSITR)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsContrastive Language-Image Pre-training
