Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu

Renhao Pei; Yihong Liu; Peiqin Lin; Fran\c{c}ois Yvon; Hinrich Sch\"utze

arXiv:2502.11862·cs.CL·May 30, 2025

Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu

Renhao Pei, Yihong Liu, Peiqin Lin, Fran\c{c}ois Yvon, Hinrich Sch\"utze

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper systematically investigates how different linguistic resources affect in-context machine translation for low-resource Manchu, highlighting the importance of dictionaries and parallel examples, and explores data augmentation to improve translation quality.

Contribution

It provides a detailed analysis of resource importance in in-context MT for low-resource languages and demonstrates a novel application of data augmentation to enhance translation models.

Findings

01

High-quality dictionaries significantly improve translation performance.

02

Good parallel examples are crucial for effective in-context MT.

03

Grammar resources have minimal impact on translation quality.

Abstract

In-context machine translation (MT) with large language models (LLMs) is a promising approach for low-resource MT, as it can readily take advantage of linguistic resources such as grammar books and dictionaries. Such resources are usually selectively integrated into the prompt so that LLMs can directly perform translation without any specific training, via their in-context learning capability (ICL). However, the relative importance of each type of resource, e.g., dictionary, grammar book, and retrieved parallel examples, is not entirely clear. To address this gap, this study systematically investigates how each resource and its quality affect the translation performance, with the Manchu language as our case study. To remove any prior knowledge of Manchu encoded in the LLM parameters and single out the effect of ICL, we also experiment with an enciphered version of Manchu texts. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cisnlp/manchu-in-context-mt
noneOfficial

Videos

Understanding In-context Machine Translation for Low-Resource Languages: A Case Study on Manchu· underline

Taxonomy

TopicsNatural Language Processing Techniques