Shortcomings of LLMs for Low-Resource Translation: Retrieval and   Understanding are Both the Problem

Sara Court; Micha Elsner

arXiv:2406.15625·cs.CL·October 28, 2024

Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem

Sara Court, Micha Elsner

PDF

Open Access 1 Video

TL;DR

This paper explores the capabilities and limitations of large language models in low-resource language translation, focusing on retrieval methods and understanding, with experiments on Southern Quechua to Spanish translation.

Contribution

It provides an empirical analysis of how different context types and retrieval methods affect LLM performance in low-resource translation tasks.

Findings

01

Small LLMs can utilize prompt context for zero-shot translation with minimal linguistic info

02

Context type and retrieval method significantly influence translation quality

03

Limitations exist in applying LLMs to most of the world's low-resource languages

Abstract

This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline. We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of context retrieved from a constrained database of digitized pedagogical materials (dictionaries and grammar lessons) and parallel corpora. Using both automatic and human evaluation of model output, we conduct ablation studies that manipulate (1) context type (morpheme translations, grammar descriptions, and corpus examples), (2) retrieval methods (automated vs. manual), and (3) model type. Our results suggest that even relatively small LLMs are capable of utilizing prompt context for zero-shot low-resource translation when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding Are Both the Problem· underline

Taxonomy

TopicsNatural Language Processing Techniques · Library Science and Information Systems

MethodsSparse Evolutionary Training