TL;DR
This paper introduces a novel cross-modal remote sensing text-image retrieval framework that effectively combines global and local features, improving retrieval accuracy by dynamic feature fusion and enhanced local representations.
Contribution
The paper proposes GaLR, a new RSCTIR framework with a multi-level dynamic fusion module and enhanced local feature extraction using DREA, achieving state-of-the-art results.
Findings
GaLR outperforms existing methods on public datasets.
The DREA module improves local feature quality.
The multivariate rerank algorithm enhances retrieval precision.
Abstract
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images. However, current RSCTIR methods mainly focus on global features of RS images, which leads to the neglect of local features that reflect target relationships and saliency. In this article, we first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels. MIDF leverages local information to correct global information, utilizes global information to supplement local information, and uses the dynamic addition of the two to generate prominent visual representation. To alleviate the pressure of the redundant targets on the graph convolution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGraph Convolutional Network · Convolution
