RAGProbe: An Automated Approach for Evaluating RAG Applications
Shangeetha Sivasothy, Scott Barnett, Stefanus Kurniawan, Zafaryab, Rasool, Rajesh Vasa

TL;DR
This paper introduces RAGProbe, an automated method for evaluating Retrieval Augmented Generation pipelines by generating question-answer variations to identify failure points, outperforming existing metrics and aiding continuous quality monitoring.
Contribution
The paper presents a novel schema and template-based approach for automated RAG pipeline evaluation, addressing limitations of manual assessment and existing metrics.
Findings
High failure rates for multi-question prompts (91%) and single-document questions (78%)
60% failure rate in academic datasets, 53% and 62% in open-domain datasets
Automated approach increases failure detection by 51% on average
Abstract
Retrieval Augmented Generation (RAG) is increasingly being used when building Generative AI applications. Evaluating these applications and RAG pipelines is mostly done manually, via a trial and error process. Automating evaluation of RAG pipelines requires overcoming challenges such as context misunderstanding, wrong format, incorrect specificity, and missing content. Prior works therefore focused on improving evaluation metrics as well as enhancing components within the pipeline using available question and answer datasets. However, they have not focused on 1) providing a schema for capturing different types of question-answer pairs or 2) creating a set of templates for generating question-answer pairs that can support automation of RAG pipeline evaluation. In this paper, we present a technique for generating variations in question-answer pairs to trigger failures in RAG pipelines. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCardiovascular Function and Risk Factors
