"Be My Cheese?": Cultural Nuance Benchmarking for Machine Translation in Multilingual LLMs
Madison Van Doren, Casey Ford, Jennifer Barajas, and Cory Holland

TL;DR
This paper introduces a large-scale human evaluation benchmark to assess cultural nuance in machine translation by multilingual LLMs, revealing significant gaps in translating culturally grounded language elements.
Contribution
It presents the first multilingual, human-annotated benchmark specifically focused on cultural nuance in translation, emphasizing the need for culturally informed models and evaluation methods.
Findings
GPT-5 outperforms other models in overall quality.
Holidays and cultural concepts translate better than idioms and puns.
Idioms are most often left untranslated.
Abstract
We present a large-scale human evaluation benchmark for assessing cultural localisation in machine translation produced by state-of-the-art multilingual large language models (LLMs). Existing MT benchmarks emphasise token-level and grammatical accuracy, but of ten overlook pragmatic and culturally grounded competencies required for real-world localisation. Building on a pilot study of 87 translations across 20 languages, we evaluate 7 multilingual LLMs across 15 target languages with 5 native-speaker raters per language. Raters scored both full-text translations and segment-level instances of culturally nuanced language (idioms, puns, holidays, and culturally embedded concepts) on an ordinal 0-3 quality scale; segment ratings additionally included an NA option for untranslated segments. Across full-text evaluations, mean overall quality is modest (1.68/3): GPT-5 (2.10/3), Claude…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution
