Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish

Yakup Abrek Er; Ilker Kesen; G\"ozde G\"ul \c{S}ahin; Aykut Erdem

arXiv:2508.16431·cs.CL·August 25, 2025

Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish

Yakup Abrek Er, Ilker Kesen, G\"ozde G\"ul \c{S}ahin, Aykut Erdem

PDF

2 Videos

TL;DR

Cetvel is a comprehensive Turkish benchmark evaluating large language models across diverse tasks, emphasizing linguistic and cultural content, revealing performance gaps in Turkish-specific models compared to multilingual ones.

Contribution

Introduces Cetvel, a novel Turkish benchmark with diverse, culturally relevant tasks, filling gaps in existing Turkish LLM evaluation frameworks.

Findings

01

Turkish instruction-tuned models underperform compared to multilingual models.

02

Grammatical error correction and extractive QA are highly discriminative tasks.

03

Multilingual models like Llama 3 outperform Turkish-specific models.

Abstract

We introduce Cetvel, a comprehensive benchmark designed to evaluate large language models (LLMs) in Turkish. Existing Turkish benchmarks often lack either task diversity or culturally relevant content, or both. Cetvel addresses these gaps by combining a broad range of both discriminative and generative tasks ensuring content that reflects the linguistic and cultural richness of Turkish language. Cetvel covers 23 tasks grouped into seven categories, including tasks such as grammatical error correction, machine translation, and question answering rooted in Turkish history and idiomatic language. We evaluate 33 open-weight LLMs (up to 70B parameters) covering different model families and instruction paradigms. Our experiments reveal that Turkish-centric instruction-tuned models generally underperform relative to multilingual or general-purpose models (e.g. Llama 3 and Mistral), despite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish· underline