Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting
Nikolay Bogoychev, Pinzhen Chen

TL;DR
This paper presents a domain-independent, minimal-effort approach to improve machine translation terminology accuracy by training a terminology-aware model and applying large language model-based refinement, achieving better terminology recall.
Contribution
It introduces a translate-then-refine method combining a terminology-aware model with large language model prompting for improved terminology correctness in translation.
Findings
Terminology-aware model effectively incorporates terminology constraints.
Large language model refinement improves terminology recall.
The approach is domain-independent and requires minimal manual effort.
Abstract
Terminology correctness is important in the downstream application of machine translation, and a prevalent way to ensure this is to inject terminology constraints into a translation system. In our submission to the WMT 2023 terminology translation task, we adopt a translate-then-refine approach which can be domain-independent and requires minimal manual efforts. We annotate random source words with pseudo-terminology translations obtained from word alignment to first train a terminology-aware model. Further, we explore two post-processing methods. First, we use an alignment process to discover whether a terminology constraint has been violated, and if so, we re-decode with the violating word negatively constrained. Alternatively, we leverage a large language model to refine a hypothesis by providing it with terminology constraints. Results show that our terminology-aware model learns to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
