TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
Jafar Isbarov, Arofat Akhundjanova, Mammad Hajili, Kavsar Huseynova, Dmitry Gaynullin, Anar Rzayev, Osman Tursun, Aizirek Turdubaeva, Ilshat Saetov, Rinat Kharisov, Saule Belginova, Ariana Kenbayeva, Amina Alisheva, Abdullatif K\"oksal, Samir Rustamov, Duygu Ataman

TL;DR
This paper introduces TUMLU, a native, comprehensive benchmark for evaluating Turkic languages' understanding in AI models, addressing the lack of high-quality, culturally nuanced evaluation datasets for these under-represented languages.
Contribution
The paper presents TUMLU and TUMLU-mini benchmarks, the first native Turkic language understanding datasets, and evaluates various large language models on these benchmarks.
Findings
Open and proprietary LLMs show varied performance across Turkic languages.
TUMLU benchmarks reveal strengths and weaknesses of models in different linguistic and cultural contexts.
Release of datasets and evaluation scripts to foster further research.
Abstract
Being able to thoroughly assess massive multi-task language understanding (MMLU) capabilities is essential for advancing the applicability of multilingual language models. However, preparing such benchmarks in high quality native language is often costly and therefore limits the representativeness of evaluation datasets. While recent efforts focused on building more inclusive MMLU benchmarks, these are conventionally built using machine translation from high-resource languages, which may introduce errors and fail to account for the linguistic and cultural intricacies of the target languages. In this paper, we address the lack of native language MMLU benchmark especially in the under-represented Turkic language family with distinct morphosyntactic and cultural characteristics. We propose two benchmarks for Turkic language MMLU: TUMLU is a comprehensive, multilingual, and natively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics and Cultural Studies · Natural Language Processing Techniques · Educational Methods and Analysis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Multi-Head Attention · Adam · Softmax · Dropout · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning
