TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages

Jafar Isbarov; Arofat Akhundjanova; Mammad Hajili; Kavsar Huseynova; Dmitry Gaynullin; Anar Rzayev; Osman Tursun; Aizirek Turdubaeva; Ilshat Saetov; Rinat Kharisov; Saule Belginova; Ariana Kenbayeva; Amina Alisheva; Abdullatif K\"oksal; Samir Rustamov; Duygu Ataman

arXiv:2502.11020·cs.CL·June 16, 2025

TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages

Jafar Isbarov, Arofat Akhundjanova, Mammad Hajili, Kavsar Huseynova, Dmitry Gaynullin, Anar Rzayev, Osman Tursun, Aizirek Turdubaeva, Ilshat Saetov, Rinat Kharisov, Saule Belginova, Ariana Kenbayeva, Amina Alisheva, Abdullatif K\"oksal, Samir Rustamov, Duygu Ataman

PDF

Open Access 1 Repo 5 Datasets

TL;DR

This paper introduces TUMLU, a native, comprehensive benchmark for evaluating Turkic languages' understanding in AI models, addressing the lack of high-quality, culturally nuanced evaluation datasets for these under-represented languages.

Contribution

The paper presents TUMLU and TUMLU-mini benchmarks, the first native Turkic language understanding datasets, and evaluates various large language models on these benchmarks.

Findings

01

Open and proprietary LLMs show varied performance across Turkic languages.

02

TUMLU benchmarks reveal strengths and weaknesses of models in different linguistic and cultural contexts.

03

Release of datasets and evaluation scripts to foster further research.

Abstract

Being able to thoroughly assess massive multi-task language understanding (MMLU) capabilities is essential for advancing the applicability of multilingual language models. However, preparing such benchmarks in high quality native language is often costly and therefore limits the representativeness of evaluation datasets. While recent efforts focused on building more inclusive MMLU benchmarks, these are conventionally built using machine translation from high-resource languages, which may introduce errors and fail to account for the linguistic and cultural intricacies of the target languages. In this paper, we address the lack of native language MMLU benchmark especially in the under-represented Turkic language family with distinct morphosyntactic and cultural characteristics. We propose two benchmarks for Turkic language MMLU: TUMLU is a comprehensive, multilingual, and natively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ceferisbarov/TUMLU
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLinguistics and Cultural Studies · Natural Language Processing Techniques · Educational Methods and Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Multi-Head Attention · Adam · Softmax · Dropout · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning