Evaluating LLM-Based Translation of a Low-Resource Technical Language: The Medical and Philosophical Greek of Galen

James L. Zainaldin; Cameron Pattison; Manuela Marai; Jacob Wu; Mark J. Schiefsky

arXiv:2602.24119·cs.CL·April 16, 2026

Evaluating LLM-Based Translation of a Low-Resource Technical Language: The Medical and Philosophical Greek of Galen

James L. Zainaldin, Cameron Pattison, Manuela Marai, Jacob Wu, Mark J. Schiefsky

PDF

TL;DR

This study assesses the quality of LLM translations of ancient Greek texts, comparing automated metrics with expert judgment, and identifies key factors influencing translation success and failure.

Contribution

It provides the first systematic expert evaluation of LLM translation quality for ancient languages and highlights the impact of terminology rarity on translation failures.

Findings

01

LLMs achieved high quality on expository texts (mean MQM 95.2/100).

02

Translation quality was lower and bimodal on pharmacological texts, with failures linked to terminology density.

03

Automated metrics only moderately correlated with human judgment, especially on variable quality texts.

Abstract

Purpose: This study evaluates the quality of commercial large language model (LLM) machine translation (MT) for Ancient Greek technical prose and benchmarks standard automated MT evaluation metrics against expert human judgment. Design: We evaluated 60 translations by three LLMs (ChatGPT, Claude, Gemini) of 20 paragraph-length passages from 2 works by the Greek physician Galen (c. 129-216 CE): an expository text with two published English translations and a pharmacological text never before translated. Quality was assessed using seven automated metrics and systematic reference-free human evaluation via a modified Multidimensional Quality Metrics (MQM) framework applied by domain specialists. Findings: On the translated expository text, LLMs achieved high quality (mean MQM score 95.2/100). On the untranslated pharmacological text, quality was lower (79.9/100) but bimodally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.