Corrective Retrieval Augmented Generation
Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling

TL;DR
CRAG enhances retrieval-augmented generation by evaluating retrieval quality, integrating web searches, and filtering key information, thereby improving robustness and performance across diverse tasks.
Contribution
This paper introduces CRAG, a novel framework that assesses retrieval quality, extends retrieval with web searches, and filters key information to improve RAG robustness.
Findings
CRAG significantly improves RAG performance on multiple datasets.
The retrieval evaluator effectively assesses document relevance.
Web search augmentation enhances retrieval quality.
Abstract
Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. Authors propose a cost-effective method to filter, refine and improve documents in RAG. This method can be applied in various RAG pipelines. 2. Detailed experiments show the effectiveness, robustness, efficiency of CRAG.
1. CRAG can not handle the multi-hop QA task. Since the confidence score is calculated between a query and a single document, the evaluator is hard to judge the complex relationships within multiple documents. 2. The filtering and refinement processes depend on the evaluator, which can potentially filter out the useful information. Besides, the paper lacks an evaluation and analysis of the filtering performance. What if the evaluator filters out helpful results? 3. It is challenging to manually
1. The paper addresses an important research question that is critical. Recognizing the imperfections in existing Retriever technologies, this paper focuses on how to mitigate these issues within the Retrieval-Augmented Generation (RAG) paradigm, thus contributing valuable insights toward developing more robust RAG systems. 2. The paper presents compelling experimental results.
1. **The experiment in this paper is not sufficient, particularly in the selection of baselines.** The paper lacks comparison with highly relevant prior works that have proposed corrective strategies for Retrieval-Augmented Generation (RAG). Notably, it does not include baselines such as RARR [1] and DRAGIN [2], which are essential for contextualizing the contribution of this work within the existing literature. Including these baselines would strengthen the evaluation and provide a clearer und
1. Compared to Standard RAG, CRAG shows significant improvements, although experiments were only conducted on a 7B model. 2. Knowledge Correction is necessary as it can assess the accuracy of retrieval results and prevent irrelevant results from impacting model performance. 3. The paper includes numerous ablation studies, making the experiments overall solid.
1. A core concern is latency; the introduction of the Knowledge Correction phase significantly increases delays, which CRAG does not discuss. The authors only discuss the time consumption of the generation phase, which could hinder CRAG's practical application. 2. Additionally, the accuracy of the T5-based Retrieval Evaluator is concerning. It is also unclear how the evaluator's accuracy impacts CRAG's generation. 3. The lack of case analysis is notable. It would be beneficial to understand how
1. The presentation is clear and easy to follow. 2. The paper presents comprehensive experiments across multiple datasets to show the performance of the proposed method.
1. The technical contribution of this paper is limited: The proposed method brings limited technical contribution to RAG area. Components of the proposed method are similar to by existing methods in Information Retrieval or RAG. For example, assessing the quality of retrieved texts to determine whether retrieval or not has been fully studied in existing RAG methods such as Self-RAG [1] and RetRobust [2]. Using large-scale web resources to replace the static corpus cannot be seen as an technical
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsFocus
