Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

Haochen Han; Qinghua Zheng; Guang Dai; Minnan Luo; Jingdong Wang

arXiv:2403.05105·cs.CV·March 11, 2024·1 cites

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, Jingdong Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces L2RM, a novel framework using Optimal Transport to rematch mismatched pairs in cross-modal retrieval, thereby improving robustness against noisy data and enhancing retrieval accuracy.

Contribution

L2RM is the first to leverage semantic similarity among unpaired samples for rematching mismatched pairs using a partial OT approach with a self-supervised cost function.

Findings

01

L2RM significantly improves robustness against mismatched pairs.

02

The method outperforms existing models on three benchmark datasets.

03

Rematching reduces the negative impact of noisy data in cross-modal retrieval.

Abstract

Collecting well-matched multimedia datasets is crucial for training cross-modal retrieval models. However, in real-world scenarios, massive multimodal data are harvested from the Internet, which inevitably contains Partially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant data will remarkably harm the cross-modal retrieval performance. Previous efforts tend to mitigate this problem by estimating a soft correspondence to down-weight the contribution of PMPs. In this paper, we aim to address this challenge from a new perspective: the potential semantic similarity among unpaired samples makes it possible to excavate useful knowledge from mismatched pairs. To achieve this, we propose L2RM, a general framework based on Optimal Transport (OT) that learns to rematch mismatched pairs. In detail, L2RM aims to generate refined alignments by seeking a minimal-cost transport plan…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hhc1997/l2rm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications