Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal   Retrieval

Yabing Wang; Shuhui Wang; Hao Luo; Jianfeng Dong; Fan Wang; Meng Han,; Xun Wang; Meng Wang

arXiv:2309.05451·cs.CV·September 12, 2023·1 cites

Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval

Yabing Wang, Shuhui Wang, Hao Luo, Jianfeng Dong, Fan Wang, Meng Han,, Xun Wang, Meng Wang

PDF

Open Access

TL;DR

This paper introduces Dual-view Curricular Optimal Transport (DCOT), a novel method for cross-lingual cross-modal retrieval that effectively handles noisy pseudo-parallel data using optimal transport and curriculum learning, improving robustness and generalization.

Contribution

The paper proposes a dual-view optimal transport framework with curriculum learning to better model noisy cross-lingual and cross-modal correspondences in retrieval tasks.

Findings

01

DCOT outperforms baseline methods on multilingual image-text and video-text datasets.

02

The approach demonstrates robustness to noisy pseudo-parallel data.

03

It generalizes well to out-of-domain data.

Abstract

Current research on cross-modal retrieval is mostly English-oriented, as the availability of a large number of English-oriented human-labeled vision-language corpora. In order to break the limit of non-English labeled data, cross-lingual cross-modal retrieval (CCR) has attracted increasing attention. Most CCR methods construct pseudo-parallel vision-language corpora via Machine Translation (MT) to achieve cross-lingual transfer. However, the translated sentences from MT are generally imperfect in describing the corresponding visual contents. Improperly assuming the pseudo-parallel data are correctly correlated will make the networks overfit to the noisy correspondence. Therefore, we propose Dual-view Curricular Optimal Transport (DCOT) to learn with noisy correspondence in CCR. In particular, we quantify the confidence of the sample pair correlation with optimal transport theory from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Cancer-related molecular mechanisms research