Spanish and LLM Benchmarks: is MMLU Lost in Translation?
Irene Plaza, Nina Melero, Cristina del Pozo, Javier Conde, Pedro, Reviriego, Marina Mayor-Rocher, Mar\'ia Grandury

TL;DR
This paper investigates how automatic translation affects the evaluation of Large Language Models on the MMLU benchmark in Spanish, revealing translation errors significantly impact performance results.
Contribution
It highlights the limitations of translating benchmarks automatically and advocates for expert adaptation to improve multilingual LLM evaluations.
Findings
Translation errors account for many performance discrepancies.
Automatic translations can alter test answers and outcomes.
Manual review shows the need for better benchmark localization.
Abstract
The evaluation of Large Language Models (LLMs) is a key element in their continuous improvement process and many benchmarks have been developed to assess the performance of LLMs in different tasks and topics. As LLMs become adopted worldwide, evaluating them in languages other than English is increasingly important. However, most LLM benchmarks are simply translated using an automated tool and then run in the target language. This means that the results depend not only on the LLM performance in that language but also on the quality of the translation. In this paper, we consider the case of the well-known Massive Multitask Language Understanding (MMLU) benchmark. Selected categories of the benchmark are translated into Spanish using Azure Translator and ChatGPT4 and run on ChatGPT4. Next, the results are processed to identify the test items that produce different answers in Spanish and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLegal Language and Interpretation · Comparative and International Law Studies · European and International Law Studies
