Quantifying Language Disparities in Multilingual Large Language Models

Songbo Hu; Ivan Vuli\'c; Anna Korhonen

arXiv:2508.17162·cs.CL·August 26, 2025

Quantifying Language Disparities in Multilingual Large Language Models

Songbo Hu, Ivan Vuli\'c, Anna Korhonen

PDF

1 Video

TL;DR

This paper introduces a framework with three metrics to better quantify and understand language performance disparities in multilingual large language models, addressing evaluation challenges and revealing fairness issues.

Contribution

It proposes a novel, interpretable framework with metrics that disentangle confounding factors, enabling more accurate assessment of language disparities in multilingual models.

Findings

01

The framework provides more reliable measurements of model and language disparities.

02

Higher overall model performance does not guarantee increased fairness across languages.

03

The approach is effective for evaluating low-resource languages.

Abstract

Results reported in large-scale multilingual evaluations are often fragmented and confounded by factors such as target languages, differences in experimental setups, and model choices. We propose a framework that disentangles these confounding variables and introduces three interpretable metrics--the performance realisation ratio, its coefficient of variation, and language potential--enabling a finer-grained and more insightful quantification of actual performance disparities across both (i) models and (ii) languages. Through a case study of 13 model variants on 11 multilingual datasets, we demonstrate that our framework provides a more reliable measurement of model performance and language disparities, particularly for low-resource languages, which have so far proven challenging to evaluate. Importantly, our results reveal that higher overall model performance does not necessarily…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Quantifying Language Disparities in Multilingual Large Language Models· underline