Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study
Aniruddha Roy, Pretam Ray, Ayush Maheshwari, Sudeshna Sarkar, Pawan, Goyal

TL;DR
This paper presents a novel framework combining a multilingual encoder and knowledge distillation to improve low-resource NMT, especially for languages not supported by existing models like mBART-50, demonstrating significant BLEU and chrF improvements.
Contribution
It introduces a new seq2seq framework utilizing a multilingual encoder and knowledge distillation to enhance translation quality for low-resource languages beyond existing pre-trained models.
Findings
Significant BLEU-4 and chrF improvements over baselines.
Effective translation for low-resource Indic languages.
Human evaluation confirms the approach's effectiveness.
Abstract
Neural Machine Translation (NMT) remains a formidable challenge, especially when dealing with low-resource languages. Pre-trained sequence-to-sequence (seq2seq) multi-lingual models, such as mBART-50, have demonstrated impressive performance in various low-resource NMT tasks. However, their pre-training has been confined to 50 languages, leaving out support for numerous low-resource languages, particularly those spoken in the Indian subcontinent. Expanding mBART-50's language support requires complex pre-training, risking performance decline due to catastrophic forgetting. Considering these expanding challenges, this paper explores a framework that leverages the benefits of a pre-trained language model along with knowledge distillation in a seq2seq architecture to facilitate translation for low-resource languages, including those not covered by mBART-50. The proposed framework employs a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Robotics and Automated Systems · Parallel Computing and Optimization Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence · Knowledge Distillation
