RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning

Kun Li; Yunxiang Li; Tianhua Zhang; Hongyin Luo; Xixin Wu; James Glass; Helen Meng

arXiv:2505.22430·cs.CL·May 29, 2025

RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning

Kun Li, Yunxiang Li, Tianhua Zhang, Hongyin Luo, Xixin Wu, James Glass, Helen Meng

PDF

Open Access

TL;DR

RAG-Zeval is an end-to-end, rule-guided evaluation framework for RAG systems that uses reinforcement learning to produce accurate, interpretable assessments with less computational cost than existing methods.

Contribution

It introduces a novel reinforcement learning-based, rule-guided evaluation method that improves faithfulness, correctness, and interpretability of RAG response assessments.

Findings

01

Achieves the strongest correlation with human judgments.

02

Outperforms larger LLM-based baselines in evaluation accuracy.

03

Provides more interpretable response evaluations.

Abstract

Robust evaluation is critical for deploying trustworthy retrieval-augmented generation (RAG) systems. However, current LLM-based evaluation frameworks predominantly rely on directly prompting resource-intensive models with complex multi-stage prompts, underutilizing models' reasoning capabilities and introducing significant computational cost. In this paper, we present RAG-Zeval (RAG-Zero Evaluator), a novel end-to-end framework that formulates faithfulness and correctness evaluation as a rule-guided reasoning task. Our approach trains evaluators with reinforcement learning, facilitating compact models to generate comprehensive and sound assessments with detailed explanation in one-pass. We introduce a ranking-based outcome reward mechanism, using preference judgments rather than absolute scores, to address the challenge of obtaining precise pointwise reward signals. To this end, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling