Domain-Specific Translation with Open-Source Large Language Models: Resource-Oriented Analysis
Aman Kassahun Wassie, Mahdi Molaei, Yasmin Moslem

TL;DR
This paper compares open-source large language models and specialized machine translation models for domain-specific medical translation, revealing that dedicated MT models currently outperform LLMs, especially in low-resource settings.
Contribution
It provides a comprehensive resource-oriented analysis of LLMs versus specialized MT models in medical translation across multiple languages.
Findings
NLLB-200 3.3B outperforms LLMs in most language directions.
Fine-tuning improves LLM performance but does not surpass specialized MT models.
Larger LLMs show potential, indicating benefits of domain-specific pre-training.
Abstract
In this work, we compare the domain-specific translation performance of open-source autoregressive decoder-only large language models (LLMs) with task-oriented machine translation (MT) models. Our experiments focus on the medical domain and cover four language directions with varied resource availability: English-to-French, English-to-Portuguese, English-to-Swahili, and Swahili-to-English. Despite recent advancements, LLMs demonstrate a significant quality gap in specialized translation compared to multilingual encoder-decoder MT models such as NLLB-200. Our results indicate that NLLB-200 3.3B outperforms all evaluated LLMs in the 7-8B parameter range across three out of the four language directions. While fine-tuning improves the performance of LLMs such as Mistral and Llama, these models still underperform compared to fine-tuned NLLB-200 3.3B models. Our findings highlight the ongoing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsKnowledge Distillation · LLaMA · Focus
