MCEval: A Dynamic Framework for Fair Multilingual Cultural Evaluation of LLMs
Shulin Huang, Linyi Yang, Yue Zhang

TL;DR
MCEval is a comprehensive multilingual framework that evaluates large language models' cultural awareness and bias across 13 cultures and languages, revealing disparities linked to language-culture alignment.
Contribution
It introduces the first dynamic, multilingual cultural evaluation framework with causal analysis capabilities for assessing LLMs' cultural understanding.
Findings
Performance varies across linguistic scenarios.
Optimal cultural performance depends on language-culture alignment.
English-centric approaches can cause fairness issues.
Abstract
Large language models exhibit cultural biases and limited cross-cultural understanding capabilities, particularly when serving diverse global user populations. We propose MCEval, a novel multilingual evaluation framework that employs dynamic cultural question construction and enables causal analysis through Counterfactual Rephrasing and Confounder Rephrasing. Our comprehensive evaluation spans 13 cultures and 13 languages, systematically assessing both cultural awareness and cultural bias across different linguistic scenarios. The framework provides 39,897 cultural awareness instances and 17,940 cultural bias instances. Experimental results reveal performance disparities across different linguistic scenarios, demonstrating that optimal cultural performance is not only linked to training data distribution, but also is related to language-culture alignment. The evaluation results also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterpreting and Communication in Healthcare
