Multi-Method Validation of Large Language Model Medical Translation Across High- and Low-Resource Languages
Chukwuebuka Anyaegbuna, Eduardo Juan Perez Guerrero, Jerry Liu, Timothy Keyes, April Liang, Natasha Steele, Stephen Ma, Jonathan Chen, Kevin Schulman

TL;DR
This study evaluates the accuracy of large language models in translating medical documents across high-, medium-, and low-resource languages, demonstrating high semantic preservation and potential for improving healthcare language access.
Contribution
It provides a comprehensive validation framework for assessing frontier LLMs in medical translation across diverse languages, highlighting their robustness regardless of resource level.
Findings
All models achieved high semantic preservation (LaBSE > 0.92).
No significant difference in translation quality between high- and low-resource languages.
High inter-model concordance indicates consistent translation fidelity.
Abstract
Language barriers affect 27.3 million U.S. residents with non-English language preference, yet professional medical translation remains costly and often unavailable. We evaluated four frontier large language models (GPT-5.1, Claude Opus 4.5, Gemini 3 Pro, Kimi K2) translating 22 medical documents into 8 languages spanning high-resource (Spanish, Chinese, Russian, Vietnamese), medium-resource (Korean, Arabic), and low-resource (Tagalog, Haitian Creole) categories using a five-layer validation framework. Across 704 translation pairs, all models achieved high semantic preservation (LaBSE greater than 0.92), with no significant difference between high- and low-resource languages (p = 0.066). Cross-model back-translation confirmed results were not driven by same-model circularity (delta = -0.0009). Inter-model concordance across four independently trained models was high (LaBSE: 0.946), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterpreting and Communication in Healthcare · Cultural Competency in Health Care · Health Policy Implementation Science
