Unanswerability Evaluation for Retrieval Augmented Generation
Xiangyu Peng, Prafulla Kumar Choubey, Caiming Xiong, Chien-Sheng Wu

TL;DR
This paper introduces UAEval4RAG, a framework for evaluating how well retrieval-augmented generation systems can reject unanswerable queries, addressing a gap in existing evaluation methods.
Contribution
We propose a new evaluation framework with a taxonomy of unanswerable categories and automatic query synthesis, enabling comprehensive assessment of RAG system robustness.
Findings
Component choice significantly impacts performance trade-offs.
Prompt design influences the balance between answer accuracy and rejection rate.
Our framework reveals hidden performance trade-offs in RAG systems.
Abstract
Existing evaluation frameworks for retrieval-augmented generation (RAG) systems focus on answerable queries, but they overlook the importance of appropriately rejecting unanswerable requests. In this paper, we introduce UAEval4RAG, a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively. We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries for any given knowledge base with unanswered ratio and acceptable ratio metrics. We conduct experiments with various RAG components, including retrieval models, rewriting methods, rerankers, language models, and prompting strategies, and reveal hidden trade-offs in performance of RAG systems. Our findings highlight the critical role of component selection and prompt design in optimizing RAG systems to balance the accuracy of answerable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTactile and Sensory Interactions
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Adam · Layer Normalization · Residual Connection · Weight Decay · WordPiece · Softmax
