Cross-lingual Retrieval for Iterative Self-Supervised Training
Chau Tran, Yuqing Tang, Xian Li, Jiatao Gu

TL;DR
This paper introduces CRISS, a novel iterative self-supervised training method that enhances cross-lingual alignment and translation quality by mining sentence pairs and retraining multilingual models, achieving state-of-the-art results.
Contribution
The paper presents a new iterative training approach, CRISS, that improves cross-lingual alignment and translation performance using self-supervised sentence pair mining.
Findings
Achieved state-of-the-art unsupervised translation on 9 language pairs with +2.4 BLEU.
Improved Tatoeba retrieval accuracy by 21.5% across 16 languages.
Enhanced supervised translation performance with an additional 1.8 BLEU on average.
Abstract
Recent studies have demonstrated the cross-lingual alignment ability of multilingual pretrained language models. In this work, we found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs. We utilized these findings to develop a new approach -- cross-lingual retrieval for iterative self-supervised training (CRISS), where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time. Using this method, we achieved state-of-the-art unsupervised machine translation results on 9 language directions with an average improvement of 2.4 BLEU, and on the Tatoeba sentence retrieval task in the XTREME benchmark on 16 languages with an average improvement of 21.5% in absolute accuracy. Furthermore, CRISS also brings an additional 1.8 BLEU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · CRISS · mBART · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam
