Yes-MT's Submission to the Low-Resource Indic Language Translation Shared Task in WMT 2024

Yash Bhaskar; Parameswari Krishnamurthy

arXiv:2512.15226·cs.CL·December 18, 2025

Yes-MT's Submission to the Low-Resource Indic Language Translation Shared Task in WMT 2024

Yash Bhaskar, Parameswari Krishnamurthy

PDF

Open Access

TL;DR

This paper details Yes-MT's diverse approaches to low-resource Indic language translation for WMT 2024, emphasizing fine-tuning pre-trained models, prompting LLMs, and training from scratch to address translation challenges.

Contribution

It introduces multiple novel fine-tuning and prompting techniques for low-resource Indic language translation, including the use of LLMs and training Transformer models from scratch.

Findings

01

LLMs show potential in low-resource translation tasks.

02

Fine-tuning pre-trained models improves translation quality.

03

Challenges remain due to limited data availability.

Abstract

This paper presents the systems submitted by the Yes-MT team for the Low-Resource Indic Language Translation Shared Task at WMT 2024 (Pakray et al., 2024), focusing on translating between English and the Assamese, Mizo, Khasi, and Manipuri languages. The experiments explored various approaches, including fine-tuning pre-trained models like mT5 (Xue et al., 2020) and IndicBart (Dabre et al., 2021) in both multilingual and monolingual settings, LoRA (Hu et al., 2021) fine-tuning IndicTrans2 (Gala et al., 2023), zero-shot and few-shot prompting (Brown, 2020) with large language models (LLMs) like Llama 3 (Dubey et al., 2024) and Mixtral 8x7b (Jiang et al., 2024), LoRA supervised fine-tuning of Llama 3 (Mecklenburg et al., 2024), and training Transformer models (Vaswani, 2017) from scratch. The results were evaluated on the WMT23 Low-Resource Indic Language Translation Shared Task test data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Big Data and Digital Economy