Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help?
Fahimeh Saleh, Wray Buntine, Gholamreza Haffari, Lan Du

TL;DR
This paper introduces a Hierarchical Knowledge Distillation method for multilingual neural machine translation that leverages language hierarchies to improve translation quality and reduce negative transfer effects across diverse languages.
Contribution
It proposes a novel HKD approach using language typology and phylogeny to enhance MNMT by selectively distilling knowledge from language groups.
Findings
Improved translation performance by about 1 BLEU score on average.
Effectively mitigates negative transfer in multilingual models.
Demonstrates success on a 53-language TED dataset.
Abstract
Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation between multiple languages, rather than training separate models for different languages. Learning a single model can enhance the low-resource translation by leveraging data from multiple languages. However, the performance of an MNMT model is highly dependent on the type of languages used in training, as transferring knowledge from a diverse set of languages degrades the translation performance due to negative transfer. In this paper, we propose a Hierarchical Knowledge Distillation (HKD) approach for MNMT which capitalises on language groups generated according to typological features and phylogeny of languages to overcome the issue of negative transfer. HKD generates a set of multilingual teacher-assistant models via a selective knowledge distillation mechanism based on the language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
