CCRS: A Zero-Shot LLM-as-a-Judge Framework for Comprehensive RAG Evaluation
Aashiq Muhamed

TL;DR
This paper introduces CCRS, a zero-shot, LLM-based evaluation framework that comprehensively assesses RAG system outputs across multiple quality dimensions with high efficiency.
Contribution
The paper presents CCRS, a novel suite of five metrics using a single pretrained LLM for zero-shot, end-to-end RAG output evaluation, improving efficiency and discriminative power.
Findings
CCRS effectively distinguishes performance differences among RAG systems.
CCRS outperforms or matches complex frameworks like RAGChecker in key evaluation aspects.
CCRS is significantly more computationally efficient than existing multi-stage evaluation methods.
Abstract
RAG systems enhance LLMs by incorporating external knowledge, which is crucial for domains that demand factual accuracy and up-to-date information. However, evaluating the multifaceted quality of RAG outputs, spanning aspects such as contextual coherence, query relevance, factual correctness, and informational completeness, poses significant challenges. Existing evaluation methods often rely on simple lexical overlap metrics, which are inadequate for capturing these nuances, or involve complex multi-stage pipelines with intermediate steps like claim extraction or require finetuning specialized judge models, hindering practical efficiency. To address these limitations, we propose CCRS (Contextual Coherence and Relevance Score), a novel suite of five metrics that utilizes a single, powerful, pretrained LLM as a zero-shot, end-to-end judge. CCRS evaluates: Contextual Coherence (CC),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Dose and Imaging
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dropout · Byte Pair Encoding · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · BERT · BART
