The Missing Ingredient in Zero-Shot Neural Machine Translation
Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin, Johnson, Wolfgang Macherey

TL;DR
This paper identifies the limitations of current multilingual NMT models in zero-shot translation and introduces auxiliary encoder losses to improve their ability to generalize to unseen language pairs, achieving state-of-the-art results.
Contribution
The paper proposes a novel auxiliary loss method on the encoder to enhance zero-shot translation in multilingual NMT models, without harming supervised performance.
Findings
Zero-shot translation performance significantly improved.
Achieved zero-shot results comparable to pivoting methods.
Scalable to multiple languages in large datasets.
Abstract
Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
