Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Kaige Xie, Philippe Laban, Prafulla Kumar Choubey, Caiming Xiong,, Chien-Sheng Wu

TL;DR
This paper introduces a sub-question coverage evaluation framework for RAG systems, revealing coverage gaps and showing that leveraging sub-questions improves response quality and ranking accuracy.
Contribution
It proposes a novel sub-question based evaluation protocol and demonstrates its effectiveness in analyzing and enhancing RAG system responses.
Findings
All answer engines cover core sub-questions more than others.
They miss around 50% of core sub-questions.
Sub-question coverage metrics achieve 82% ranking accuracy.
Abstract
Evaluating retrieval-augmented generation (RAG) systems remains challenging, particularly for open-ended questions that lack definitive answers and require coverage of multiple sub-topics. In this paper, we introduce a novel evaluation framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question. We propose decomposing questions into sub-questions and classifying them into three types -- core, background, and follow-up -- to reflect their roles and importance. Using this categorization, we introduce a fine-grained evaluation protocol that provides insights into the retrieval and generation characteristics of RAG systems, including three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat. Interestingly, we find that while all answer engines cover core sub-questions more often than background or follow-up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEducational Technology and Assessment · Student Assessment and Feedback · Innovative Teaching Methods
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Linear Layer · Dropout · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay · Dense Connections
