ZNO-Eval: Benchmarking reasoning capabilities of large language models   in Ukrainian

Mykyta Syromiatnikov; Victoria Ruvinskaya; Anastasiya Troynina

arXiv:2501.06715·cs.CL·January 14, 2025

ZNO-Eval: Benchmarking reasoning capabilities of large language models in Ukrainian

Mykyta Syromiatnikov, Victoria Ruvinskaya, Anastasiya Troynina

PDF

1 Repo

TL;DR

This paper introduces ZNO-Eval, a comprehensive benchmark for assessing reasoning abilities of large language models in Ukrainian, based on real exam tasks across multiple subjects, revealing strengths and gaps in current models.

Contribution

The paper presents the first Ukrainian reasoning benchmark derived from standardized exams, enabling detailed evaluation of LLMs across diverse subjects and complexities.

Findings

01

GPT-4o outperforms others in reasoning and language tasks.

02

Gemini Pro and GPT-4 Turbo excel in arithmetic problems.

03

Models perform near maximum in history and geography, but lag in Ukrainian language and math.

Abstract

As the usage of large language models for problems outside of simple text understanding or generation increases, assessing their abilities and limitations becomes crucial. While significant progress has been made in this area over the last few years, most research has focused on benchmarking English, leaving other languages underexplored. This makes evaluating the reasoning and robustness level of language models in Ukrainian particularly challenging. The purpose of this work is to establish a comprehensive benchmark for the reasoning capabilities evaluation of large language models in the Ukrainian language. This paper presents the ZNO-Eval benchmark based on real exam tasks from Ukraine's standardized educational testing system: the External Independent Evaluation and the National Multi-subject Test. With single-answer options, multiple-choice, matching, and open-ended questions from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlpforua/zno
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Absolute Position Encodings · Cosine Annealing · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer