Controlled Retrieval-augmented Context Evaluation for Long-form RAG
Jia-Huei Ju, Suzan Verberne, Maarten de Rijke, Andrew Yates

TL;DR
This paper introduces CRUX, a new evaluation framework for assessing retrieval-augmented contexts in long-form RAG tasks, emphasizing the importance of context quality over relevance metrics, and reveals significant room for improvement in current retrieval methods.
Contribution
The paper proposes CRUX, a novel human-centered, question-based evaluation framework that directly measures the quality of retrieval-augmented contexts in long-form generation tasks.
Findings
CRUX provides more reflective and diagnostic evaluation of retrieval quality.
Current retrieval methods show substantial room for improvement.
CRUX enables fine-grained assessment of retrieval relevance in long-form RAG.
Abstract
Retrieval-augmented generation (RAG) enhances large language models by incorporating context retrieved from external knowledge sources. While the effectiveness of the retrieval module is typically evaluated with relevance-based ranking metrics, such metrics may be insufficient to reflect the retrieval's impact on the final RAG result, especially in long-form generation scenarios. We argue that providing a comprehensive retrieval-augmented context is important for long-form RAG tasks like report generation and propose metrics for assessing the context independent of generation. We introduce CRUX, a \textbf{C}ontrolled \textbf{R}etrieval-a\textbf{U}gmented conte\textbf{X}t evaluation framework designed to directly assess retrieval-augmented contexts. This framework uses human-written summaries to control the information scope of knowledge, enabling us to measure how well the context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dropout · Byte Pair Encoding · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · BERT · BART
