Multilingual European Language Models: Benchmarking Approaches and Challenges
Fabio Barth, Georg Rehm

TL;DR
This paper critically examines multilingual European benchmarks for large language models, identifying key challenges and proposing solutions to improve evaluation methods for diverse languages and cultural contexts.
Contribution
It provides a comprehensive analysis of seven multilingual benchmarks, highlights major challenges, and suggests strategies like human-in-the-loop verification to improve multilingual LLM evaluation.
Findings
Seven multilingual benchmarks analyzed
Identified four major evaluation challenges
Proposed solutions for translation quality and bias mitigation
Abstract
The breakthrough of generative large language models (LLMs) that can solve different tasks through chat interaction has led to a significant increase in the use of general benchmarks to assess the quality or performance of these models beyond individual applications. There is also a need for better methods to evaluate and also to compare models due to the ever increasing number of new models published. However, most of the established benchmarks revolve around the English language. This paper analyses the benefits and limitations of current evaluation datasets, focusing on multilingual European benchmarks. We analyse seven multilingual benchmarks and identify four major challenges. Furthermore, we discuss potential solutions to enhance translation quality and mitigate cultural biases, including human-in-the-loop verification and iterative translation ranking. Our analysis highlights the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEuropean and International Law Studies · Natural Language Processing Techniques · Government, Law, and Information Management
MethodsAttentive Walk-Aggregating Graph Neural Network
