From Words to Worlds: Benchmarking Cross-Cultural Cultural Understanding in Machine Translation
Bangju Han, Yingqi Wang, Huang Qing, Tiyuan Li, Fengyi Yang, Ahtamjan Ahmat, Abibulla Atawulla, Yating Yang, Xi Zhou

TL;DR
This paper introduces CulT-Eval, a comprehensive benchmark for evaluating machine translation models on culturally grounded expressions, revealing current models' struggles with preserving cultural nuances and proposing improved evaluation metrics.
Contribution
The paper presents CulT-Eval, a new benchmark with a systematic framework and error taxonomy for assessing translation of culture-specific expressions in machine translation.
Findings
Current models often fail to preserve cultural meanings.
Existing metrics overlook culturally induced meaning deviations.
CulT-Eval reveals systematic failure modes in translation models.
Abstract
Culture-expressions, such as idioms, slang, and culture-specific items (CSIs), are pervasive in natural language and encode meanings that go beyond literal linguistic form. Accurately translating such expressions remains challenging for machine translation systems. Despite this, existing benchmarks remain fragmented and do not provide a systematic framework for evaluating translation performance on culture-loaded expressions. To address this gap, we introduce CulT-Eval, a benchmark designed to evaluate how models handle different types of culturally grounded expressions. CulT-Eval comprises over 7,959 carefully curated instances spanning multiple types of culturally grounded expressions, with a comprehensive error taxonomy covering culturally grounded expressions. Through extensive evaluation of large language models and detailed analysis, we identify recurring and systematic failure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Sentiment Analysis and Opinion Mining
