Enhancing Multilingual RAG Systems with Debiased Language Preference-Guided Query Fusion

Jeonghyun Park; Byeongjeong Kim; Seojin Hwang; Hwanhee Lee

arXiv:2601.02956·cs.CL·April 14, 2026

Enhancing Multilingual RAG Systems with Debiased Language Preference-Guided Query Fusion

Jeonghyun Park, Byeongjeong Kim, Seojin Hwang, Hwanhee Lee

PDF

TL;DR

This paper identifies biases in multilingual RAG systems that favor English due to structural evaluation biases and proposes DeLP and DELTA methods to mitigate these biases, improving cross-lingual retrieval and generation.

Contribution

The paper introduces DeLP, a debiased metric for assessing language preference, and DELTA, a framework leveraging monolingual alignment to enhance multilingual RAG performance.

Findings

01

DeLP reveals that English preference is due to evidence distribution, not inherent bias.

02

Retrievers favor monolingual query-document alignment.

03

DELTA outperforms baseline methods across multiple languages.

Abstract

Multilingual Retrieval-Augmented Generation (mRAG) systems often exhibit a perceived preference for high-resource languages, particularly English, resulting in the widespread adoption of English pivoting. While prior studies attribute this advantage to the superior English-centric capabilities of Large Language Models (LLMs), we find that such measurements are significantly distorted by structural priors inherent in evaluation benchmarks. Specifically, we identify exposure bias and a gold availability prior-both driven by the disproportionate concentration of resources in English-as well as cultural priors rooted in topic locality, as factors that hinder accurate assessment of genuine language preference. To address these biases, we propose DeLP (Debiased Language Preference), a calibrated metric designed to explicitly factor out these structural confounds. Our analysis using DeLP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.