TurkBench: A Benchmark for Evaluating Turkish Large Language Models
\c{C}a\u{g}r{\i} Toraman, Ahmet Kaan Sever, Ayse Aysu Cengiz, Elif Ecem Arslan, G\"orkem Sevin\c{c}, Mete Mert Birdal, Yusuf Faruk G\"uldemir, Ali Bu\u{g}ra Kanburo\u{g}lu, Sezen Feleko\u{g}lu, Osman G\"urlek, Sarp Kantar, Birsen \c{S}ahin K\"ut\"uk, B\"u\c{s}ra Tufan

TL;DR
TurkBench is a comprehensive evaluation benchmark specifically designed for assessing the performance of large language models in Turkish across multiple linguistic and reasoning tasks.
Contribution
This paper introduces TurkBench, the first extensive Turkish language model benchmark with over 8,000 samples across 21 subtasks, filling a critical gap in language-specific evaluation tools.
Findings
TurkBench covers diverse tasks including knowledge, reasoning, and grammar.
It provides a culturally relevant dataset for Turkish language model evaluation.
The benchmark is publicly available for online submissions.
Abstract
With the recent surge in the development of large language models, the need for comprehensive and language-specific evaluation benchmarks has become critical. While significant progress has been made in evaluating English-language models, benchmarks for other languages, particularly those with unique linguistic characteristics such as Turkish, remain less developed. Our study introduces TurkBench, a comprehensive benchmark designed to assess the capabilities of generative large language models in the Turkish language. TurkBench involves 8,151 data samples across 21 distinct subtasks. These are organized under six main categories of evaluation: Knowledge, Language Understanding, Reasoning, Content Moderation, Turkish Grammar and Vocabulary, and Instruction Following. The diverse range of tasks and the culturally relevant data would provide researchers and developers with a valuable tool…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
