RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation

Keerthana Murugaraj; Salima Lamsiyah; Martin Theobald

arXiv:2601.04196·cs.CL·January 9, 2026

RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation

Keerthana Murugaraj, Salima Lamsiyah, Martin Theobald

PDF

Open Access 1 Video

TL;DR

RAGVUE is a comprehensive, explainable framework for automated evaluation of RAG systems, providing detailed insights into retrieval, reasoning, and grounding errors to improve understanding and development.

Contribution

It introduces RAGVUE, a novel diagnostic tool that decomposes RAG performance into multiple explainable metrics with structured explanations, supporting manual and automated evaluation.

Findings

01

RAGVUE uncovers fine-grained failures overlooked by existing metrics.

02

It offers a transparent, explainable evaluation process for RAG systems.

03

The framework is publicly available with APIs and interfaces for easy integration.

Abstract

Evaluating Retrieval-Augmented Generation (RAG) systems remains a challenging task: existing metrics often collapse heterogeneous behaviors into single scores and provide little insight into whether errors arise from retrieval,reasoning, or grounding. In this paper, we introduce RAGVUE, a diagnostic and explainable framework for automated, reference-free evaluation of RAG pipelines. RAGVUE decomposes RAG behavior into retrieval quality, answer relevance and completeness, strict claim-level faithfulness, and judge calibration. Each metric includes a structured explanation, making the evaluation process transparent. Our framework supports both manual metric selection and fully automated agentic evaluation. It also provides a Python API, CLI, and a local Streamlit interface for interactive usage. In comparative experiments, RAGVUE surfaces fine-grained failures that existing tools such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation· underline

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Explainable Artificial Intelligence (XAI)