TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish
Arda Y\"uksel, Abdullatif K\"oksal, L\"utfi Kerem \c{S}enel, Anna, Korhonen, Hinrich Sch\"utze

TL;DR
TurkishMMLU is a comprehensive Turkish language benchmark with over 10,000 questions across various subjects, designed to evaluate the reasoning, comprehension, and cultural understanding of diverse large language models.
Contribution
It introduces the first high-school level Turkish QA benchmark, addressing limitations of translation-based evaluation and providing a detailed analysis of LLMs' Turkish language capabilities.
Findings
Multilingual and Turkish-specific models show varied performance.
Chain-of-thought reasoning improves accuracy.
Model performance varies across subjects and difficulty levels.
Abstract
Multiple choice question answering tasks evaluate the reasoning, comprehension, and mathematical abilities of Large Language Models (LLMs). While existing benchmarks employ automatic translation for multilingual evaluation, this approach is error-prone and potentially introduces culturally biased questions, especially in social sciences. We introduce the first multitask, multiple-choice Turkish QA benchmark, TurkishMMLU, to evaluate LLMs' understanding of the Turkish language. TurkishMMLU includes over 10,000 questions, covering 9 different subjects from Turkish high-school education curricula. These questions are written by curriculum experts, suitable for the high-school curricula in Turkey, covering subjects ranging from natural sciences and math questions to more culturally representative topics such as Turkish Literature and the history of the Turkish Republic. We evaluate over 20…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification
