NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning
Zhongtao Miao, Kaiyan Zhao, Masaaki Nagata, Yoshimasa Tsuruoka

TL;DR
This paper introduces NeoAMT, a reinforcement learning-based framework for translating sentences with neologisms, utilizing a Wiktionary-based search toolkit and a new multilingual dataset.
Contribution
It presents a novel agentic framework for neologism-aware machine translation, including a dedicated dataset, a search toolkit, and an RL training strategy with a new reward design.
Findings
Constructed a multilingual dataset with 16 languages and 75 translation directions.
Developed a Wiktionary-based search toolkit for neologism translation.
Proposed an RL training framework with a novel reward and adaptive rollout strategy.
Abstract
Neologism-aware machine translation aims to translate source sentences containing neologisms into target languages. This field remains underexplored compared with general machine translation (MT). In this paper, we propose an agentic framework, NeoAMT, for neologism-aware machine translation equipped with a Wiktionary-based search toolkit. Specifically, we first construct a dedicated dataset for neologism-aware machine translation and build a search toolkit grounded in Wiktionary. The dataset covers 16 languages and 75 translation directions in total, derived from approximately 10 million records of an English Wiktionary dump. The retrieval corpus of the search toolkit is also constructed from around 3 million cleaned records of the same dump. We then leverage the dataset and toolkit to train a translation agent via reinforcement learning (RL) and to evaluate the accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
