NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task

Pramit Sahoo; Maharaj Brahma; Maunendra Sankar Desarkar

arXiv:2410.03215·cs.CL·June 11, 2025

NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task

Pramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents a low-resource Indic language translation system for WMT24, utilizing fine-tuning of pre-trained models with alignment augmentation, achieving competitive scores across multiple language pairs.

Contribution

It introduces a language-specific fine-tuning approach with alignment augmentation for low-resource Indic languages in machine translation.

Findings

01

Achieved high chrF2 scores for four Indic language pairs.

02

Explored multilingual training with various strategies.

03

Provided open-source code and models for reproducibility.

Abstract

In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng $\leftrightarrow$ {as, kha, lus, mni} as participating language pairs. In this shared task, we explore the finetuning of a pre-trained model motivated by the pre-trained objective of aligning embeddings closer by alignment augmentation \cite{lin-etal-2020-pre} for 22 scheduled Indian languages. Our primary system is based on language-specific finetuning on a pre-trained model. We achieve chrF2 scores of 50.6, 42.3, 54.9, and 66.3 on the official public test set for eng $\to$ as, eng $\to$ kha, eng $\to$ lus, eng $\to$ mni respectively. We also explore multilingual training with/without language grouping and layer-freezing. Our code, models, and generated translations are available here: https://github.com/pramitsahoo/WMT2024-LRILT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pramitsahoo/wmt2024-lrilt
noneOfficial

Videos

NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task· underline

Taxonomy

TopicsParallel Computing and Optimization Techniques

MethodsSparse Evolutionary Training