NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task
Pramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar

TL;DR
This paper presents a low-resource Indic language translation system for WMT24, utilizing fine-tuning of pre-trained models with alignment augmentation, achieving competitive scores across multiple language pairs.
Contribution
It introduces a language-specific fine-tuning approach with alignment augmentation for low-resource Indic languages in machine translation.
Findings
Achieved high chrF2 scores for four Indic language pairs.
Explored multilingual training with various strategies.
Provided open-source code and models for reproducibility.
Abstract
In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng {as, kha, lus, mni} as participating language pairs. In this shared task, we explore the finetuning of a pre-trained model motivated by the pre-trained objective of aligning embeddings closer by alignment augmentation \cite{lin-etal-2020-pre} for 22 scheduled Indian languages. Our primary system is based on language-specific finetuning on a pre-trained model. We achieve chrF2 scores of 50.6, 42.3, 54.9, and 66.3 on the official public test set for engas, engkha, englus, engmni respectively. We also explore multilingual training with/without language grouping and layer-freezing. Our code, models, and generated translations are available here: https://github.com/pramitsahoo/WMT2024-LRILT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsParallel Computing and Optimization Techniques
MethodsSparse Evolutionary Training
