FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models
Joona Kyt\"oniemi, Jousia Piha, Akseli Reunamo, Fedor Vitiugin, Farrokh Mehryary, Sampo Pyysalo

TL;DR
FIN-bench-v2 is a comprehensive, standardized benchmark suite for evaluating Finnish large language models across diverse tasks, with rigorous task selection and extensive resource sharing.
Contribution
It introduces a unified Finnish benchmark suite with improved task selection criteria and extensive resources, enhancing evaluation consistency for Finnish language models.
Findings
Selected robust tasks based on model learning curves
Evaluated instruction-tuned models across multiple tasks
Resources and datasets are publicly available
Abstract
We introduce FIN-bench-v2, a unified benchmark suite for evaluating large language models in Finnish. FIN-bench-v2 consolidates Finnish versions of widely used benchmarks together with an updated and expanded version of the original FIN-bench into a single, consistently formatted collection, covering multiple-choice and generative tasks across reading comprehension, commonsense reasoning, sentiment analysis, world knowledge, and alignment. All datasets are converted to HuggingFace Datasets, which include both cloze and multiple-choice prompt formulations with five variants per task, and we incorporate human annotation or review for machine-translated resources such as GoldenSwag and XED. To select robust tasks, we pretrain a set of 2.15B-parameter decoder-only models and use their learning curves to compute monotonicity, signal-to-noise, non-random performance, and model ordering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
