GreekMMLU: A Native-Sourced Multitask Benchmark for Evaluating Language Models in Greek

Yang Zhang; Mersin Konomi; Christos Xypolopoulos; Konstantinos Divriotis; Konstantinos Skianis; Giannis Nikolentzos; Giorgos Stamou; Guokan Shang; Michalis Vazirgiannis

arXiv:2602.05150·cs.CL·March 31, 2026

GreekMMLU: A Native-Sourced Multitask Benchmark for Evaluating Language Models in Greek

Yang Zhang, Mersin Konomi, Christos Xypolopoulos, Konstantinos Divriotis, Konstantinos Skianis, Giannis Nikolentzos, Giorgos Stamou, Guokan Shang, Michalis Vazirgiannis

PDF

1 Datasets

TL;DR

GreekMMLU is a comprehensive, native-sourced Greek language benchmark with 21,805 questions across diverse subjects, designed to evaluate and improve multilingual language models' performance in Greek.

Contribution

It introduces the first large-scale, authentic Greek language understanding benchmark with detailed taxonomy and difficulty levels, enabling robust evaluation of LLMs in Greek.

Findings

01

Open- and closed-source LLMs show significant performance gaps.

02

Greek-adapted models outperform general multilingual models.

03

Model scale, adaptation, and prompting significantly influence performance.

Abstract

Large Language Models (LLMs) are commonly trained on multilingual corpora that include Greek, yet reliable evaluation benchmarks for Greek-particularly those based on authentic, native-sourced content-remain limited. Existing datasets are often machine-translated from English, failing to capture Greek linguistic and cultural characteristics. We introduce GreekMMLU, a native-sourced benchmark for massive multitask language understanding in Greek, comprising 21,805 multiple-choice questions across 45 subject areas, organized under a newly defined subject taxonomy and annotated with educational difficulty levels spanning primary to professional examinations. All questions are sourced or authored in Greek from academic, professional, and governmental exams. We publicly release 16,857 samples and reserve 4,948 samples for a private leaderboard to enable robust and contamination-resistant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

dascim/GreekMMLU
dataset· 2.9k dl
2.9k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.