Yes-MT's Submission to the Low-Resource Indic Language Translation Shared Task in WMT 2024
Yash Bhaskar, Parameswari Krishnamurthy

TL;DR
This paper details Yes-MT's diverse approaches to low-resource Indic language translation for WMT 2024, emphasizing fine-tuning pre-trained models, prompting LLMs, and training from scratch to address translation challenges.
Contribution
It introduces multiple novel fine-tuning and prompting techniques for low-resource Indic language translation, including the use of LLMs and training Transformer models from scratch.
Findings
LLMs show potential in low-resource translation tasks.
Fine-tuning pre-trained models improves translation quality.
Challenges remain due to limited data availability.
Abstract
This paper presents the systems submitted by the Yes-MT team for the Low-Resource Indic Language Translation Shared Task at WMT 2024 (Pakray et al., 2024), focusing on translating between English and the Assamese, Mizo, Khasi, and Manipuri languages. The experiments explored various approaches, including fine-tuning pre-trained models like mT5 (Xue et al., 2020) and IndicBart (Dabre et al., 2021) in both multilingual and monolingual settings, LoRA (Hu et al., 2021) fine-tuning IndicTrans2 (Gala et al., 2023), zero-shot and few-shot prompting (Brown, 2020) with large language models (LLMs) like Llama 3 (Dubey et al., 2024) and Mixtral 8x7b (Jiang et al., 2024), LoRA supervised fine-tuning of Llama 3 (Mecklenburg et al., 2024), and training Transformer models (Vaswani, 2017) from scratch. The results were evaluated on the WMT23 Low-Resource Indic Language Translation Shared Task test data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Big Data and Digital Economy
