Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu
Renhao Pei, Yihong Liu, Peiqin Lin, Fran\c{c}ois Yvon, Hinrich Sch\"utze

TL;DR
This paper systematically investigates how different linguistic resources affect in-context machine translation for low-resource Manchu, highlighting the importance of dictionaries and parallel examples, and explores data augmentation to improve translation quality.
Contribution
It provides a detailed analysis of resource importance in in-context MT for low-resource languages and demonstrates a novel application of data augmentation to enhance translation models.
Findings
High-quality dictionaries significantly improve translation performance.
Good parallel examples are crucial for effective in-context MT.
Grammar resources have minimal impact on translation quality.
Abstract
In-context machine translation (MT) with large language models (LLMs) is a promising approach for low-resource MT, as it can readily take advantage of linguistic resources such as grammar books and dictionaries. Such resources are usually selectively integrated into the prompt so that LLMs can directly perform translation without any specific training, via their in-context learning capability (ICL). However, the relative importance of each type of resource, e.g., dictionary, grammar book, and retrieved parallel examples, is not entirely clear. To address this gap, this study systematically investigates how each resource and its quality affect the translation performance, with the Manchu language as our case study. To remove any prior knowledge of Manchu encoded in the LLM parameters and single out the effect of ICL, we also experiment with an enciphered version of Manchu texts. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
