MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia, Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis, Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva,, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin

TL;DR
This paper introduces MERA, an open benchmark for evaluating Russian-language foundation models across multiple skills, providing a standardized, comprehensive assessment framework to understand their capabilities and limitations.
Contribution
The paper presents MERA, a new multimodal evaluation benchmark with a methodology, open-source tools, and a leaderboard for assessing Russian foundation models in zero- and few-shot settings.
Findings
Open LMs lag behind human performance.
MERA covers 21 tasks across 11 skill domains.
Benchmark facilitates standardized evaluation of Russian LMs.
Abstract
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsBalanced Selection
