TULUN: Transparent and Adaptable Low-resource Machine Translation

Rapha\"el Merx; Hanna Suominen; Lois Hong; Nick Thieberger; Trevor Cohn; Ekaterina Vylomova

arXiv:2505.18683·cs.CL·May 27, 2025

TULUN: Transparent and Adaptable Low-resource Machine Translation

Rapha\"el Merx, Hanna Suominen, Lois Hong, Nick Thieberger, Trevor Cohn, Ekaterina Vylomova

PDF

Open Access 1 Repo 1 Video

TL;DR

Tulun is a versatile, open-source platform that enhances low-resource machine translation by integrating terminology-aware neural MT with LLM-based post-editing, improving accuracy across specialized domains and languages.

Contribution

It introduces Tulun, a novel, user-friendly system combining neural MT and LLMs with terminology resources for domain-adapted translation without fine-tuning.

Findings

01

Achieves 16.90-22.41 ChrF++ points improvement in medical and disaster relief translation.

02

Outperforms standalone MT and LLM approaches on FLORES low-resource languages.

03

Demonstrates effectiveness in real-world and benchmark scenarios.

Abstract

Machine translation (MT) systems that support low-resource languages often struggle on specialized domains. While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for non-technical users and small organizations. To address this gap, we propose Tulun, a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories. Our open-source web-based platform enables users to easily create, edit, and leverage terminology resources, fostering a collaborative human-machine translation process that respects and incorporates domain expertise while increasing MT accuracy. Evaluations show effectiveness in both real-world and benchmark scenarios: on medical and disaster relief translation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raphaelmerx/tulun
noneOfficial

Videos

Tulun: Transparent and Adaptable Low-resource Machine Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques