RAGXplain: From Explainable Evaluation to Actionable Guidance of RAG Pipelines
Dvir Cohen, Tamir Houri, Lin Burg, Gilad Barkan

TL;DR
RAGXplain is an evaluation framework that transforms RAG system performance metrics into actionable insights, enabling targeted improvements through diagnostic reasoning and guided interventions.
Contribution
It introduces a structured diagnostic approach using the 'Metric Diamond' and LLM reasoning to identify failure modes and suggest targeted improvements for RAG pipelines.
Findings
Applying RAGXplain improves RAG pipeline performance.
It provides natural-language explanations of failure modes.
The framework is validated across five QA benchmarks.
Abstract
Retrieval-Augmented Generation (RAG) systems couple large language models with external knowledge, yet most evaluation methods report aggregate scores that reveal whether a pipeline underperforms but not where or why. We introduce RAGXplain, an evaluation framework that translates performance metrics into actionable guidance. RAGXplain structures evaluation around a 'Metric Diamond' connecting user input, retrieved context, generated answer, and (when available) ground truth via six diagnostic dimensions. It uses LLM reasoning to produce natural-language failure-mode explanations and prioritized interventions. Across five QA benchmarks, applying RAGXplain's recommendations in a single human-guided pass consistently improves RAG pipeline performance across multiple metrics. We release RAGXplain as open source to support reproducibility and community adoption.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStructural Integrity and Reliability Analysis · Oil and Gas Production Techniques · Geotechnical Engineering and Underground Structures
MethodsAttention Is All You Need · Linear Warmup With Linear Decay · Softmax · Attention Dropout · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Residual Connection · Byte Pair Encoding · Weight Decay
