Multiple References with Meaningful Variations Improve Literary Machine   Translation

Si Wu; John Wieting; David A. Smith

arXiv:2412.18707·cs.CL·February 27, 2025

Multiple References with Meaningful Variations Improve Literary Machine Translation

Si Wu, John Wieting, David A. Smith

PDF

Open Access

TL;DR

This study explores how using multiple, meaningfully varied references improves literary machine translation, showing that high-quality paraphrases enhance translation quality more than multiple low-similarity references.

Contribution

The paper demonstrates that employing semantically similar paraphrases as multiple references significantly boosts literary MT performance, providing best practices for reference selection.

Findings

01

Medium and high semantic similarity paraphrases improve BLEU, COMET, and chrF++ scores.

02

Using high-quality paraphrases outperforms unfiltered datasets in fine-tuned LLMs.

03

Multiple references with meaningful variations offer marginal gains over single references with more source texts.

Abstract

While a source sentence can be translated in many ways, most machine translation (MT) models are trained with only a single reference. Previous work has shown that using synthetic paraphrases can improve MT. This paper investigates best practices for employing multiple references by analyzing the semantic similarity among different English translations of world literature in the Par3 dataset. We classify the semantic similarity between paraphrases into three levels: low, medium, and high, and fine-tune three different models (mT5-large, LLaMA-2-7B, and Opus-MT) for literary MT tasks. Across different models, holding the total training instances constant, single-reference but more source texts only marginally outperforms multiple-reference with half of the source texts. Moreover, when fine-tuning an LLM, using paraphrases with medium and high semantic similarity outperforms an unfiltered…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling