Multiple References with Meaningful Variations Improve Literary Machine Translation
Si Wu, John Wieting, David A. Smith

TL;DR
This study explores how using multiple, meaningfully varied references improves literary machine translation, showing that high-quality paraphrases enhance translation quality more than multiple low-similarity references.
Contribution
The paper demonstrates that employing semantically similar paraphrases as multiple references significantly boosts literary MT performance, providing best practices for reference selection.
Findings
Medium and high semantic similarity paraphrases improve BLEU, COMET, and chrF++ scores.
Using high-quality paraphrases outperforms unfiltered datasets in fine-tuned LLMs.
Multiple references with meaningful variations offer marginal gains over single references with more source texts.
Abstract
While a source sentence can be translated in many ways, most machine translation (MT) models are trained with only a single reference. Previous work has shown that using synthetic paraphrases can improve MT. This paper investigates best practices for employing multiple references by analyzing the semantic similarity among different English translations of world literature in the Par3 dataset. We classify the semantic similarity between paraphrases into three levels: low, medium, and high, and fine-tune three different models (mT5-large, LLaMA-2-7B, and Opus-MT) for literary MT tasks. Across different models, holding the total training instances constant, single-reference but more source texts only marginally outperforms multiple-reference with half of the source texts. Moreover, when fine-tuning an LLM, using paraphrases with medium and high semantic similarity outperforms an unfiltered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
