XRAG: Cross-lingual Retrieval-Augmented Generation

Wei Liu; Sony Trenous; Leonardo F. R. Ribeiro; Bill Byrne; Felix Hieber

arXiv:2505.10089·cs.CL·May 16, 2025

XRAG: Cross-lingual Retrieval-Augmented Generation

Wei Liu, Sony Trenous, Leonardo F. R. Ribeiro, Bill Byrne, Felix Hieber

PDF

Open Access 1 Datasets 1 Video

TL;DR

XRAG is a new benchmark for evaluating large language models' ability to perform cross-lingual retrieval-augmented generation, highlighting challenges in language correctness and reasoning across languages.

Contribution

The paper introduces XRAG, a novel cross-lingual RAG benchmark with complex reasoning questions based on news articles, revealing new challenges in multilingual LLM performance.

Findings

01

Models struggle with response language correctness in monolingual retrieval.

02

Reasoning over multilingual retrieved information is a key challenge.

03

XRAG exposes gaps in LLM reasoning abilities across languages.

Abstract

We propose XRAG, a novel benchmark designed to evaluate the generation abilities of LLMs in cross-lingual Retrieval-Augmented Generation (RAG) settings where the user language does not match the retrieval results. XRAG is constructed from recent news articles to ensure that its questions require external knowledge to be answered. It covers the real-world scenarios of monolingual and multilingual retrieval, and provides relevancy annotations for each retrieved document. Our novel dataset construction pipeline results in questions that require complex reasoning, as evidenced by the significant gap between human and LLM performance. Consequently, XRAG serves as a valuable benchmark for studying LLM reasoning abilities, even before considering the additional cross-lingual complexity. Experimental results on five LLMs uncover two previously unreported challenges in cross-lingual RAG: 1) in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

AmazonScience/XRAG
dataset· 139 dl
139 dl

Videos

XRAG: Cross-lingual Retrieval-Augmented Generation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies