Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Xiao Pan, Mingxuan Wang, Liwei Wu, Lei Li

TL;DR
This paper introduces mRASP2, a contrastive learning-based training method that enhances many-to-many multilingual translation, especially improving non-English directions by aligning cross-language representations.
Contribution
It proposes a novel contrastive learning approach combined with data augmentation to unify multilingual translation models, significantly improving non-English translation quality.
Findings
Outperforms existing models on English-centric directions.
Achieves 10+ BLEU improvement on non-English directions.
Comparable or better than mBART on multiple translation tasks.
Abstract
Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose mRASP2, a training method to obtain a single unified multilingual translation model. mRASP2 is empowered by two techniques: a) a contrastive learning scheme to close the gap among representations of different languages, and b) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mRASP2 outperforms existing best unified model and achieves competitive or even better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Contrastive Learning · Absolute Position Encodings · Position-Wise Feed-Forward Layer · mBART · Byte Pair Encoding · Adam · Label Smoothing · Residual Connection
