A Resource-Efficient Training Framework for Remote Sensing Text--Image   Retrieval

Weihang Zhang; Jihao Li; Shuoke Li; Ziqing Niu; Jialiang Chen; Wenkai; Zhang

arXiv:2501.10638·cs.CV·January 22, 2025

A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval

Weihang Zhang, Jihao Li, Shuoke Li, Ziqing Niu, Jialiang Chen, Wenkai, Zhang

PDF

Open Access

TL;DR

This paper introduces a resource-efficient framework for remote sensing text-image retrieval that reduces memory usage and improves performance by novel modules and data augmentation techniques.

Contribution

The authors propose the CMER framework with Focus-Adapter, scene label augmentation, and negative sample recycling to enhance efficiency and accuracy in RSTIR.

Findings

01

Achieves 2-5% higher retrieval accuracy than recent methods.

02

Reduces memory consumption by 49%.

03

Increases training data throughput by 1.4 times.

Abstract

Remote sensing text--image retrieval (RSTIR) aims to retrieve the matched remote sensing (RS) images from the database according to the descriptive text. Recently, the rapid development of large visual-language pre-training models provides new insights for RSTIR. Nevertheless, as the complexity of models grows in RSTIR, the previous studies suffer from suboptimal resource efficiency during transfer learning. To address this issue, we propose a computation and memory-efficient retrieval (CMER) framework for RSTIR. To reduce the training memory consumption, we propose the Focus-Adapter module, which adopts a side branch structure. Its focus layer suppresses the interference of background pixels for small targets. Simultaneously, to enhance data efficacy, we regard the RS scene category as the metadata and design a concise augmentation technique. The scene label augmentation leverages the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsFocus