Learning to Translate Ambiguous Terminology by Preference Optimization on Post-Edits
Nathaniel Berger, Johannes Eschbach-Dymanus, Miriam Exel, Matthias Huck, Stefan Riezler

TL;DR
This paper introduces a preference optimization method for neural machine translation that leverages human post-edits to improve disambiguation of terminology without relying on strict dictionaries or human intervention during decoding.
Contribution
It proposes a novel framework that uses post-edited data for preference optimization, enhancing terminology disambiguation in NMT without hard constraints or additional human input.
Findings
Significant improvement in term accuracy over baseline
Effective combination of supervised fine-tuning and preference optimization
No significant loss in overall translation quality (COMET score)
Abstract
In real world translation scenarios, terminology is rarely one-to-one. Instead, multiple valid translations may appear in a terminology dictionary, but correctness of a translation depends on corporate style guides and context. This can be challenging for neural machine translation (NMT) systems. Luckily, in a corporate context, many examples of human post-edits of valid but incorrect terminology exist. The goal of this work is to learn how to disambiguate our terminology based on these corrections. Our approach is based on preference optimization, using the term post-edit as the knowledge to be preferred. While previous work had to rely on unambiguous translation dictionaries to set hard constraints during decoding, or to add soft constraints in the input, our framework requires neither one-to-one dictionaries nor human intervention at decoding time. We report results on English-German…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · linguistics and terminology studies
