XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation

Qianren Mao; Yangyifei Luo; Qili Zhang; Yashuo Luo; Zhilong Cao; Jinlong Zhang; HanWen Hao; Zhijun Chen; Weifeng Jiang; Junnan Liu; Xiaolong Wang; Zhenting Huang; Zhixing Tan; Sun Jie; Bo Li; Xudong Liu; Richong Zhang; Jianxin Li

arXiv:2412.15529·cs.CL·May 19, 2025

XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation

Qianren Mao, Yangyifei Luo, Qili Zhang, Yashuo Luo, Zhilong Cao, Jinlong Zhang, HanWen Hao, Zhijun Chen, Weifeng Jiang, Junnan Liu, Xiaolong Wang, Zhenting Huang, Zhixing Tan, Sun Jie, Bo Li, Xudong Liu, Richong Zhang, Jianxin Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces XRAG, a comprehensive benchmark and diagnostic framework for evaluating and improving the core components of retrieval-augmented generation systems, enhancing their accuracy and robustness.

Contribution

We present XRAG, an open-source modular benchmark and diagnostic toolkit for systematically evaluating and optimizing RAG system components across multiple stages.

Findings

01

Identified key failure points in RAG components

02

Provided systematic evaluation protocols for RAG modules

03

Proposed targeted solutions to improve RAG performance

Abstract

Retrieval-augmented generation (RAG) synergizes the retrieval of pertinent data with the generative capabilities of Large Language Models (LLMs), ensuring that the generated output is not only contextually relevant but also accurate and current. We introduce XRAG, an open-source, modular codebase that facilitates exhaustive evaluation of the performance of foundational components of advanced RAG modules. These components are systematically categorized into four core phases: pre-retrieval, retrieval, post-retrieval, and generation. We systematically analyse them across reconfigured datasets, providing a comprehensive benchmark for their effectiveness. As the complexity of RAG systems continues to escalate, we underscore the critical need to identify potential failure points in RAG systems. We formulate a suite of experimental methodologies and diagnostic testing protocols to dissect the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

docailab/xrag
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Weight Decay · Multi-Head Attention · Layer Normalization · WordPiece · Dropout