TL;DR
This paper introduces a 1-to-K contrastive learning approach for cross-lingual cross-modal retrieval, addressing inconsistency issues across languages and modalities, and proposes a new evaluation metric to measure rank variance.
Contribution
The paper presents a novel 1-to-K contrastive learning method that treats all languages equally and reduces error propagation and bias in CCR models.
Findings
Improves recall rates across multiple datasets.
Reduces rank variance across languages.
Achieves state-of-the-art performance with smaller pre-training data.
Abstract
Cross-lingual Cross-modal Retrieval (CCR) is an essential task in web search, which aims to break the barriers between modality and language simultaneously and achieves image-text retrieval in the multi-lingual scenario with a single model. In recent years, excellent progress has been made based on cross-lingual cross-modal pre-training; particularly, the methods based on contrastive learning on large-scale data have significantly improved retrieval tasks. However, these methods directly follow the existing pre-training methods in the cross-lingual or cross-modal domain, leading to two problems of inconsistency in CCR: The methods with cross-lingual style suffer from the intra-modal error propagation, resulting in inconsistent recall performance across languages in the whole dataset. The methods with cross-modal style suffer from the inter-modal optimization direction bias, resulting in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
