Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Shivalika Singh, Angelika Romanou, Cl\'ementine Fourrier, David I., Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly, Marchisio, Wei Qi Leong, Yosephine Susanto, Raymond Ng, Shayne Longpre,, Wei-Yin Ko, Sebastian Ruder, Madeline Smith, Antoine Bosselut

TL;DR
This paper investigates cultural and linguistic biases in multilingual datasets like MMLU, highlighting their impact on model evaluation and introducing Global MMLU, a more culturally aware benchmark across 42 languages.
Contribution
It identifies biases in existing multilingual benchmarks and presents Global MMLU, an improved, culturally sensitive evaluation dataset with broader language coverage and bias annotations.
Findings
28% of questions require culturally sensitive knowledge
84.9% of geographic questions focus on North America or Europe
Model rankings vary significantly when considering culturally biased subsets
Abstract
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks. These biases stem not only from differences in language but also from the cultural knowledge required to interpret questions, reducing the practical utility of translated datasets like MMLU. Furthermore, translation often introduces artefacts that can distort the meaning or clarity of questions in the target language. A common practice in multilingual evaluation is to rely on machine-translated evaluation sets, but simply translating a dataset is insufficient to address these challenges. In this work, we trace the impact of both of these issues on multilingual evaluations and ensuing model performances. Our large-scale evaluation of state-of-the-art open and proprietary models illustrates that progress on MMLU depends heavily on learning Western-centric concepts, with 28%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSecond Language Learning and Teaching
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · Sparse Evolutionary Training · Focus
