Towards Tailored Recovery of Lexical Diversity in Literary Machine   Translation

Esther Ploeger; Huiyuan Lai; Rik van Noord; Antonio Toral

arXiv:2408.17308·cs.CL·September 2, 2024

Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation

Esther Ploeger, Huiyuan Lai, Rik van Noord, Antonio Toral

PDF

Open Access

TL;DR

This paper introduces a novel reranking method to recover lexical diversity in literary machine translation, aiming to produce translations that better reflect the original's style and richness.

Contribution

It proposes a flexible reranking approach using a classifier to enhance lexical diversity in literary machine translation, moving beyond rigid methods.

Findings

01

Achieves lexical diversity scores close to human translations for certain books.

02

Demonstrates variability in lexical diversity across different novels.

03

Validates the effectiveness of reranking in improving translation quality.

Abstract

Machine translations are found to be lexically poorer than human translations. The loss of lexical diversity through MT poses an issue in the automatic translation of literature, where it matters not only what is written, but also how it is written. Current methods for increasing lexical diversity in MT are rigid. Yet, as we demonstrate, the degree of lexical diversity can vary considerably across different novels. Thus, rather than aiming for the rigid increase of lexical diversity, we reframe the task as recovering what is lost in the machine translation process. We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text. We evaluate our approach on 31 English-to-Dutch book translations, and find that, for certain books, our approach retrieves lexical diversity scores that are close to human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling