Domain Terminology Integration into Machine Translation: Leveraging Large Language Models
Yasmin Moslem, Gianfranco Romani, Mahdi Molaei, Rejwanul Haque, John, D. Kelleher, Andy Way

TL;DR
This paper presents a novel approach using large language models to generate synthetic data and post-edit translations, significantly improving the integration of technical terms in machine translation for multiple language pairs.
Contribution
It introduces a four-step method combining LLM-generated data and post-editing to enhance terminology accuracy in MT systems, a novel integration of LLMs for this purpose.
Findings
Terms incorporated into translations nearly doubled from 36.67% to 72.88%.
The approach improved terminology accuracy across three language pairs.
The method effectively enhances domain-specific translation quality.
Abstract
This paper discusses the methods that we used for our submissions to the WMT 2023 Terminology Shared Task for German-to-English (DE-EN), English-to-Czech (EN-CS), and Chinese-to-English (ZH-EN) language pairs. The task aims to advance machine translation (MT) by challenging participants to develop systems that accurately translate technical terms, ultimately enhancing communication and understanding in specialised domains. To this end, we conduct experiments that utilise large language models (LLMs) for two purposes: generating synthetic bilingual terminology-based data, and post-editing translations generated by an MT model through incorporating pre-approved terms. Our system employs a four-step process: (i) using an LLM to generate bilingual synthetic data based on the provided terminology, (ii) fine-tuning a generic encoder-decoder MT model, with a mix of the terminology-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices
