Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote   Sensing Image Retrieval

Zhiqiang Yuan; Wenkai Zhang; Kun Fu; Xuan Li; Chubo Deng; Hongqi Wang,; and Xian Sun

arXiv:2204.09868·cs.CV·April 22, 2022

Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval

Zhiqiang Yuan, Wenkai Zhang, Kun Fu, Xuan Li, Chubo Deng, Hongqi Wang,, and Xian Sun

PDF

1 Repo

TL;DR

This paper introduces a novel asymmetric multimodal feature matching network (AMFMN) for cross-modal remote sensing image retrieval, addressing multi-scale features and redundancy issues to improve accuracy on a newly constructed fine-grained dataset.

Contribution

The paper proposes a new AMFMN model with multi-scale visual self-attention and a dynamic triplet loss, along with a challenging fine-grained RSITMD dataset for improved retrieval performance.

Findings

01

Achieves state-of-the-art results on four RS text-image datasets.

02

Effectively filters redundant features and handles multi-scale visual information.

03

Demonstrates robustness on a newly constructed fine-grained dataset.

Abstract

Remote sensing (RS) cross-modal text-image retrieval has attracted extensive attention for its advantages of flexible input and efficient query. However, traditional methods ignore the characteristics of multi-scale and redundant targets in RS image, leading to the degradation of retrieval accuracy. To cope with the problem of multi-scale scarcity and target redundancy in RS multimodal retrieval task, we come up with a novel asymmetric multimodal feature matching network (AMFMN). Our model adapts to multi-scale feature inputs, favors multi-source retrieval methods, and can dynamically filter redundant features. AMFMN employs the multi-scale visual self-attention (MVSA) module to extract the salient features of RS image and utilizes visual features to guide the text representation. Furthermore, to alleviate the positive samples ambiguity caused by the strong intraclass similarity in RS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaoyuan1996/AMFMN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTriplet Loss