Multilingual European Language Models: Benchmarking Approaches and   Challenges

Fabio Barth; Georg Rehm

arXiv:2502.12895·cs.CL·April 3, 2025

Multilingual European Language Models: Benchmarking Approaches and Challenges

Fabio Barth, Georg Rehm

PDF

Open Access

TL;DR

This paper critically examines multilingual European benchmarks for large language models, identifying key challenges and proposing solutions to improve evaluation methods for diverse languages and cultural contexts.

Contribution

It provides a comprehensive analysis of seven multilingual benchmarks, highlights major challenges, and suggests strategies like human-in-the-loop verification to improve multilingual LLM evaluation.

Findings

01

Seven multilingual benchmarks analyzed

02

Identified four major evaluation challenges

03

Proposed solutions for translation quality and bias mitigation

Abstract

The breakthrough of generative large language models (LLMs) that can solve different tasks through chat interaction has led to a significant increase in the use of general benchmarks to assess the quality or performance of these models beyond individual applications. There is also a need for better methods to evaluate and also to compare models due to the ever increasing number of new models published. However, most of the established benchmarks revolve around the English language. This paper analyses the benefits and limitations of current evaluation datasets, focusing on multilingual European benchmarks. We analyse seven multilingual benchmarks and identify four major challenges. Furthermore, we discuss potential solutions to enhance translation quality and mitigate cultural biases, including human-in-the-loop verification and iterative translation ranking. Our analysis highlights the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEuropean and International Law Studies · Natural Language Processing Techniques · Government, Law, and Information Management

MethodsAttentive Walk-Aggregating Graph Neural Network