Investigating Language Preference of Multilingual RAG Systems

Jeonghyun Park; Hwanhee Lee

arXiv:2502.11175·cs.CL·June 3, 2025

Investigating Language Preference of Multilingual RAG Systems

Jeonghyun Park, Hwanhee Lee

PDF

Open Access 1 Video

TL;DR

This paper investigates language preferences in multilingual RAG systems, revealing biases in retrieval and generation, and proposes DKM-RAG to mitigate these biases and improve multilingual performance.

Contribution

It systematically analyzes language biases in mRAG and introduces DKM-RAG, a novel framework that fuses translated passages to reduce language preference issues.

Findings

01

Retrievers prefer high-resource and query languages.

02

Generators favor query language or Latin scripts.

03

DKM-RAG improves multilingual retrieval and generation performance.

Abstract

Multilingual Retrieval-Augmented Generation (mRAG) systems enhance language models by integrating external multilingual information to produce context-aware responses. However, mRAG systems struggle with retrieving relevant information due to linguistic variations between queries and documents, generating inconsistent responses when multilingual sources conflict. In this work, we systematically investigate language preferences in both retrieval and generation of mRAG through a series of experiments. Our analysis indicates that retrievers tend to prefer high-resource and query languages, yet this preference does not consistently improve generation performance. Moreover, we observe that generators prefer the query language or Latin scripts, leading to inconsistent outputs. To overcome these issues, we propose Dual Knowledge Multilingual RAG (DKM-RAG), a simple yet effective framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Investigating Language Preference of Multilingual RAG Systems· underline

Taxonomy

TopicsSpeech and dialogue systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Adam · Softmax · Dropout · Weight Decay · BART · WordPiece · Layer Normalization