RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation
Keerthana Murugaraj, Salima Lamsiyah, Martin Theobald

TL;DR
RAGVUE is a comprehensive, explainable framework for automated evaluation of RAG systems, providing detailed insights into retrieval, reasoning, and grounding errors to improve understanding and development.
Contribution
It introduces RAGVUE, a novel diagnostic tool that decomposes RAG performance into multiple explainable metrics with structured explanations, supporting manual and automated evaluation.
Findings
RAGVUE uncovers fine-grained failures overlooked by existing metrics.
It offers a transparent, explainable evaluation process for RAG systems.
The framework is publicly available with APIs and interfaces for easy integration.
Abstract
Evaluating Retrieval-Augmented Generation (RAG) systems remains a challenging task: existing metrics often collapse heterogeneous behaviors into single scores and provide little insight into whether errors arise from retrieval,reasoning, or grounding. In this paper, we introduce RAGVUE, a diagnostic and explainable framework for automated, reference-free evaluation of RAG pipelines. RAGVUE decomposes RAG behavior into retrieval quality, answer relevance and completeness, strict claim-level faithfulness, and judge calibration. Each metric includes a structured explanation, making the evaluation process transparent. Our framework supports both manual metric selection and fully automated agentic evaluation. It also provides a Python API, CLI, and a local Streamlit interface for interactive usage. In comparative experiments, RAGVUE surfaces fine-grained failures that existing tools such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Explainable Artificial Intelligence (XAI)
