Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation
Vivek Iyer, Bhavitvya Malik, Pavel Stepachev, Pinzhen Chen, Barry, Haddow, and Alexandra Birch

TL;DR
This paper investigates how data scale and diversity affect the adaptation of large language models for low-resource machine translation, emphasizing the importance of parallel data and cautioning against excessive diversity.
Contribution
It challenges recent trends by showing parallel data remains crucial and that diversity can hinder transfer in low-resource LLM-based translation.
Findings
Parallel data is essential during pre-training and fine-tuning.
Data diversity can cause interference rather than transfer.
Results are consistent across multiple low-resource language groups.
Abstract
Despite the recent popularity of Large Language Models (LLMs) in Machine Translation (MT), their performance in low-resource languages (LRLs) still lags significantly behind Neural Machine Translation (NMT) models. In this work, we explore what it would take to adapt LLMs for the low-resource setting. Particularly, we re-examine the role of two factors: a) the importance and application of parallel data, and b) diversity in Supervised Fine-Tuning (SFT). Recently, parallel data has seen reduced use in adapting LLMs for MT, while data diversity has been embraced to promote transfer across languages and tasks. However, for low-resource LLM-MT, we show that the opposite is true for both considerations: a) parallel data is critical during both pre-training and SFT; b) diversity tends to cause interference instead of transfer. Our experiments with three LLMs across two low-resourced language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · Shrink and Fine-Tune
