In-Context Example Selection via Similarity Search Improves Low-Resource   Machine Translation

Armel Zebaze; Beno\^it Sagot; Rachel Bawden

arXiv:2408.00397·cs.CL·August 2, 2024

In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation

Armel Zebaze, Beno\^it Sagot, Rachel Bawden

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper demonstrates that similarity-based in-context example selection can significantly improve low-resource machine translation performance using large language models, challenging previous mixed results.

Contribution

It provides a systematic study comparing selection strategies across multiple LLMs and languages, highlighting the benefits of similarity search for low-resource translation tasks.

Findings

01

Similarity search improves translation quality for low-resource languages

02

Diversity and quality balance in example pools affects performance

03

Proposes an adapted evaluation protocol for LLM-based MT

Abstract

The ability of generative large language models (LLMs) to perform in-context learning has given rise to a large body of research into how best to prompt models for various natural language processing tasks. In this paper, we focus on machine translation (MT), a task that has been shown to benefit from in-context translation examples. However no systematic studies have been published on how best to select examples, and mixed results have been reported on the usefulness of similarity-based selection over random selection. We provide a study covering multiple LLMs and multiple in-context example retrieval strategies, comparing multilingual sentence embeddings. We cover several language directions, representing different levels of language resourcedness (English into French, German, Swahili and Wolof). Contrarily to previously published results, we find that sentence embedding similarity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

armelrandy/icl-mt
pytorchOfficial

Videos

In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsFocus