All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

Dan Wang; Guozhao Mo; Yafei Shi; Cheng Zhang; Bo Zheng; Boxi Cao; Xuanang Chen; Yaojie Lu; Hongyu Lin; Ben He; Xianpei Han; Le Sun

arXiv:2604.20199·cs.CL·April 23, 2026

All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

Dan Wang, Guozhao Mo, Yafei Shi, Cheng Zhang, Bo Zheng, Boxi Cao, Xuanang Chen, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Le Sun

PDF

TL;DR

This paper identifies language bias in multilingual retrieval-augmented generation systems and introduces LAURA, a reranker that reduces bias and improves performance across languages.

Contribution

The paper presents LAURA, a novel language-agnostic reranker that aligns evidence ranking with generative utility to mitigate language bias in mRAG systems.

Findings

01

LAURA reduces language bias in mRAG systems.

02

LAURA improves performance across multiple languages.

03

Quantifies the performance gap caused by language bias.

Abstract

Multilingual Retrieval-Augmented Generation (mRAG) leverages cross-lingual evidence to ground Large Language Models (LLMs) in global knowledge. However, we show that current mRAG systems suffer from a language bias during reranking, systematically favoring English and the query's native language. By introducing an estimated oracle evidence analysis, we quantify a substantial performance gap between existing rerankers and the achievable upper bound. Further analysis reveals a critical distributional mismatch: while optimal predictions require evidence scattered across multiple languages, current systems systematically suppress such ``answer-critical'' documents, thereby limiting downstream generation performance. To bridge this gap, we propose \textit{\textbf{L}anguage-\textbf{A}gnostic \textbf{U}tility-driven \textbf{R}eranker \textbf{A}lignment (LAURA)}, which aligns multilingual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.