Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

Yabing Wang; Jianfeng Dong; Tianxiang Liang; Minsong Zhang; Rui Cai,; Xun Wang

arXiv:2208.12526·cs.CV·August 29, 2022

Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

Yabing Wang, Jianfeng Dong, Tianxiang Liang, Minsong Zhang, Rui Cai,, Xun Wang

PDF

1 Repo

TL;DR

This paper introduces a noise-robust cross-lingual cross-modal retrieval method for low-resource languages, leveraging machine translation and self-distillation to improve retrieval accuracy without extra labeled data.

Contribution

It proposes a novel multi-view self-distillation approach with cross-attention and back-translation techniques to enhance noise robustness in low-resource language retrieval tasks.

Findings

01

Significant performance improvements on three cross-modal retrieval benchmarks.

02

Effective noise reduction in textual embeddings from machine translation.

03

Compatibility with pre-trained vision-and-language models like CLIP.

Abstract

Despite the recent developments in the field of cross-modal retrieval, there has been less research focusing on low-resource languages due to the lack of manually annotated datasets. In this paper, we propose a noise-robust cross-lingual cross-modal retrieval method for low-resource languages. To this end, we use Machine Translation (MT) to construct pseudo-parallel sentence pairs for low-resource languages. However, as MT is not perfect, it tends to introduce noise during translation, rendering textual embeddings corrupted and thereby compromising the retrieval performance. To alleviate this, we introduce a multi-view self-distillation method to learn noise-robust target-language representations, which employs a cross-attention module to generate soft pseudo-targets to provide direct supervision from the similarity-based view and feature-based view. Besides, inspired by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huiguanlab/nrccr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training · Concatenated Skip Connection · Softmax