Evaluating Retrieval Quality in Retrieval-Augmented Generation
Alireza Salemi, Hamed Zamani

TL;DR
This paper introduces eRAG, a novel evaluation method for retrieval-augmented generation that correlates better with downstream performance and is more computationally efficient than traditional methods.
Contribution
eRAG provides a new evaluation framework that assesses retrieval quality based on downstream task performance, reducing computational costs and improving correlation with actual RAG system effectiveness.
Findings
eRAG achieves higher correlation with downstream performance (Kendall's τ up to 0.494).
eRAG significantly reduces GPU memory usage and runtime.
The method is effective across diverse datasets.
Abstract
Evaluating retrieval-augmented generation (RAG) presents challenges, particularly for retrieval models within these systems. Traditional end-to-end evaluation methods are computationally expensive. Furthermore, evaluation of the retrieval model's performance based on query-document relevance labels shows a small correlation with the RAG system's downstream performance. We propose a novel evaluation approach, eRAG, where each document in the retrieval list is individually utilized by the large language model within the RAG system. The output generated for each document is then evaluated based on the downstream task ground truth labels. In this manner, the downstream performance for each document serves as its relevance label. We employ various downstream task metrics to obtain document-level annotations and aggregate them using set-based or ranking metrics. Extensive experiments on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Speech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Byte Pair Encoding · Dense Connections · Residual Connection · Softmax · Adam · Linear Warmup With Linear Decay · Layer Normalization
