XRAG: Cross-lingual Retrieval-Augmented Generation
Wei Liu, Sony Trenous, Leonardo F. R. Ribeiro, Bill Byrne, Felix Hieber

TL;DR
XRAG is a new benchmark for evaluating large language models' ability to perform cross-lingual retrieval-augmented generation, highlighting challenges in language correctness and reasoning across languages.
Contribution
The paper introduces XRAG, a novel cross-lingual RAG benchmark with complex reasoning questions based on news articles, revealing new challenges in multilingual LLM performance.
Findings
Models struggle with response language correctness in monolingual retrieval.
Reasoning over multilingual retrieved information is a key challenge.
XRAG exposes gaps in LLM reasoning abilities across languages.
Abstract
We propose XRAG, a novel benchmark designed to evaluate the generation abilities of LLMs in cross-lingual Retrieval-Augmented Generation (RAG) settings where the user language does not match the retrieval results. XRAG is constructed from recent news articles to ensure that its questions require external knowledge to be answered. It covers the real-world scenarios of monolingual and multilingual retrieval, and provides relevancy annotations for each retrieved document. Our novel dataset construction pipeline results in questions that require complex reasoning, as evidenced by the significant gap between human and LLM performance. Consequently, XRAG serves as a valuable benchmark for studying LLM reasoning abilities, even before considering the additional cross-lingual complexity. Experimental results on five LLMs uncover two previously unreported challenges in cross-lingual RAG: 1) in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
