Machine Translation Advancements of Low-Resource Indian Languages by Transfer Learning
Bin Wei, Jiawei Zhen, Zongyao Li, Zhanglin Wu, Daimeng Wei, Jiaxin, Guo, Zhiqiang Rao, Shaojun Li, Yuanchang Luo, Hengchao Shang, Jinlong Yang,, Yuhao Xie, Hao Yang

TL;DR
This paper explores transfer learning strategies to improve machine translation for low-resource Indian languages, achieving significant BLEU score improvements through fine-tuning existing models and multilingual training.
Contribution
It introduces tailored transfer learning approaches for low-resource Indian languages, demonstrating effective fine-tuning and multilingual training methods that enhance translation quality.
Findings
Achieved BLEU scores up to 47.9 for certain language pairs.
Demonstrated the effectiveness of transfer learning for low-resource languages.
Provided new benchmarks for Indian language machine translation.
Abstract
This paper introduces the submission by Huawei Translation Center (HW-TSC) to the WMT24 Indian Languages Machine Translation (MT) Shared Task. To develop a reliable machine translation system for low-resource Indian languages, we employed two distinct knowledge transfer strategies, taking into account the characteristics of the language scripts and the support available from existing open-source models for Indian languages. For Assamese(as) and Manipuri(mn), we fine-tuned the existing IndicTrans2 open-source model to enable bidirectional translation between English and these languages. For Khasi (kh) and Mizo (mz), We trained a multilingual model as a baseline using bilingual data from these four language pairs, along with an additional about 8kw English-Bengali bilingual data, all of which share certain linguistic features. This was followed by fine-tuning to achieve bidirectional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
