Efficient Terminology Integration for LLM-based Translation in Specialized Domains
Sejoon Kim, Mingi Sung, Jeonghwan Lee, Hyunkuk Lim, Jorge Froilan, Gimenez Perez

TL;DR
This paper presents a new method for training large language models to better handle specialized terminology in translation tasks, using efficient term extraction and data reconstruction techniques, resulting in high-quality, consistent translations in domain-specific fields.
Contribution
The paper introduces a systematic approach combining Trie Tree-based term extraction and data reconstruction to improve terminology integration in LLM-based translation with less data.
Findings
Achieved highest translation score in WMT patent task
Enhanced model's ability to handle specialized terminology
Demonstrated broad applicability in domain-specific translation
Abstract
Traditional machine translation methods typically involve training models directly on large parallel corpora, with limited emphasis on specialized terminology. However, In specialized fields such as patent, finance, or biomedical domains, terminology is crucial for translation, with many terms that needs to be translated following agreed-upon conventions. In this paper we introduce a methodology that efficiently trains models with a smaller amount of data while preserving the accuracy of terminology translation. We achieve this through a systematic process of term extraction and glossary creation using the Trie Tree algorithm, followed by data reconstruction to teach the LLM how to integrate these specialized terms. This methodology enhances the model's ability to handle specialized terminology and ensures high-quality translations, particularly in fields where term consistency is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · linguistics and terminology studies
