Language-Aware Multilingual Machine Translation with Self-Supervised Learning
Haoran Xu, Jean Maillard, Vedanuj Goswami

TL;DR
This paper introduces a novel self-supervised learning approach with intra-distillation for multilingual machine translation, significantly enhancing performance by learning language-specific parameters and co-training with denoising tasks.
Contribution
It proposes a simple yet effective SSL task called concurrent denoising combined with intra-distillation to improve multilingual translation models.
Findings
Outperforms three state-of-the-art SSL methods by large margins
Achieves 11.3% and 3.7% improvements on 8-language and 15-language benchmarks
Highlights the importance of language-specific parameters in MMT
Abstract
Multilingual machine translation (MMT) benefits from cross-lingual transfer but is a challenging multitask optimization problem. This is partly because there is no clear framework to systematically learn language-specific parameters. Self-supervised learning (SSL) approaches that leverage large quantities of monolingual data (where parallel data is unavailable) have shown promise by improving translation performance as complementary tasks to the MMT task. However, jointly optimizing SSL and MMT tasks is even more challenging. In this work, we first investigate how to utilize intra-distillation to learn more *language-specific* parameters and then show the importance of these language-specific parameters. Next, we propose a novel but simple SSL task, concurrent denoising, that co-trains with the MMT task by concurrently denoising monolingual data on both the encoder and decoder. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
