Beyond Many-Shot Translation: Scaling In-Context Demonstrations For Low-Resource Machine Translation
Luis Frentzen Salim, Esteban Carlin, Alexandre Morinvil, Xi Ai, Lun-Wei Ku

TL;DR
This paper investigates scaling in-context learning for low-resource machine translation using large language models, revealing that larger context windows have limited benefits and are highly dependent on the type of training data.
Contribution
It demonstrates the effects of scaling in-context demonstrations beyond few-shot settings for low-resource MT and compares different data types for in-context supervision.
Findings
Scaling context beyond a certain point offers diminishing returns.
Monolingual supervision can be competitive with parallel data.
Context window size impacts translation quality depending on data type.
Abstract
Building machine translation (MT) systems for low-resource languages is notably difficult due to the scarcity of high-quality data. Although Large Language Models (LLMs) have improved MT system performance, adapting them to lesser-represented languages remains challenging. In-context learning (ICL) may offer novel ways to adapt LLMs for low-resource MT by conditioning models on demonstration at inference time. In this study, we explore scaling low-resource machine translation ICL beyond the few-shot setting to thousands of examples with long-context models. We scale in-context token budget to 1M tokens and compare three types of training corpora used as in-context supervision: monolingual unsupervised data, instruction-style data, and parallel data (English--target and Indonesian--target). Our experiments on Javanese and Sundanese show that gains from additional context saturate quickly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
