TL;DR
This paper introduces a robust method for remote sensing image-text retrieval that effectively handles noisy data by employing a self-paced learning strategy and a robust triplet loss, significantly improving performance on benchmark datasets.
Contribution
The paper proposes a novel RRSITR framework with a self-paced learning approach and a robust triplet loss to address noisy correspondence in remote sensing image-text retrieval.
Findings
Outperforms state-of-the-art methods on benchmark datasets.
Effectively handles high noise rates in data.
Improves retrieval accuracy with noisy and mismatched data.
Abstract
As a pivotal task that bridges remote visual and linguistic understanding, Remote Sensing Image-Text Retrieval (RSITR) has attracted considerable research interest in recent years. However, almost all RSITR methods implicitly assume that image-text pairs are matched perfectly. In practice, acquiring a large set of well-aligned data pairs is often prohibitively expensive or even infeasible. In addition, we also notice that the remote sensing datasets (e.g., RSITMD) truly contain some inaccurate or mismatched image text descriptions. Based on the above observations, we reveal an important but untouched problem in RSITR, i.e., Noisy Correspondence (NC). To overcome these challenges, we propose a novel Robust Remote Sensing Image-Text Retrieval (RRSITR) paradigm that designs a self-paced learning strategy to mimic human cognitive learning patterns, thereby learning from easy to hard from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
