LLMzSz{\L}: a comprehensive LLM benchmark for Polish
Krzysztof Jassem, Micha{\l} Ciesi\'o{\l}ka, Filip Grali\'nski, Piotr, Jab{\l}o\'nski, Jakub Pokrywka, Marek Kubis, Monika Jab{\l}o\'nska, Ryszard, Staruch

TL;DR
This paper presents LLMzSz{ extl}, the first comprehensive Polish language benchmark based on national exams, evaluating various LLMs' abilities to transfer knowledge and assist in exam validation.
Contribution
It introduces a large-scale Polish benchmark for LLMs, analyzing multilingual and monolingual models' performance on diverse exam questions and their correlation with human results.
Findings
Multilingual LLMs outperform monolingual models in accuracy.
Monolingual models may be advantageous when model size is a constraint.
LLMs show potential in exam validation and error detection.
Abstract
This article introduces the first comprehensive benchmark for the Polish language at this scale: LLMzSz{\L} (LLMs Behind the School Desk). It is based on a coherent collection of Polish national exams, including both academic and professional tests extracted from the archives of the Polish Central Examination Board. It covers 4 types of exams, coming from 154 domains. Altogether, it consists of almost 19k closed-ended questions. We investigate the performance of open-source multilingual, English, and Polish LLMs to verify LLMs' abilities to transfer knowledge between languages. Also, the correlation between LLMs and humans at model accuracy and exam pass rate levels is examined. We show that multilingual LLMs can obtain superior results over monolingual ones; however, monolingual models may be beneficial when model size matters. Our analysis highlights the potential of LLMs in assisting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems · Natural Language Processing Techniques
