Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge   Distillation: A Case Study

Aniruddha Roy; Pretam Ray; Ayush Maheshwari; Sudeshna Sarkar; Pawan; Goyal

arXiv:2407.06538·cs.CL·July 10, 2024·1 cites

Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study

Aniruddha Roy, Pretam Ray, Ayush Maheshwari, Sudeshna Sarkar, Pawan, Goyal

PDF

Open Access 1 Video

TL;DR

This paper presents a novel framework combining a multilingual encoder and knowledge distillation to improve low-resource NMT, especially for languages not supported by existing models like mBART-50, demonstrating significant BLEU and chrF improvements.

Contribution

It introduces a new seq2seq framework utilizing a multilingual encoder and knowledge distillation to enhance translation quality for low-resource languages beyond existing pre-trained models.

Findings

01

Significant BLEU-4 and chrF improvements over baselines.

02

Effective translation for low-resource Indic languages.

03

Human evaluation confirms the approach's effectiveness.

Abstract

Neural Machine Translation (NMT) remains a formidable challenge, especially when dealing with low-resource languages. Pre-trained sequence-to-sequence (seq2seq) multi-lingual models, such as mBART-50, have demonstrated impressive performance in various low-resource NMT tasks. However, their pre-training has been confined to 50 languages, leaving out support for numerous low-resource languages, particularly those spoken in the Indian subcontinent. Expanding mBART-50's language support requires complex pre-training, risking performance decline due to catastrophic forgetting. Considering these expanding challenges, this paper explores a framework that leverages the benefits of a pre-trained language model along with knowledge distillation in a seq2seq architecture to facilitate translation for low-resource languages, including those not covered by mBART-50. The proposed framework employs a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study· underline

Taxonomy

TopicsNeural Networks and Applications · Robotics and Automated Systems · Parallel Computing and Optimization Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence · Knowledge Distillation