Corrective Retrieval Augmented Generation

Shi-Qi Yan; Jia-Chen Gu; Yun Zhu; Zhen-Hua Ling

arXiv:2401.15884·cs.CL·October 8, 2024·23 cites

Corrective Retrieval Augmented Generation

Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling

PDF

Open Access 2 Repos 4 Reviews

TL;DR

CRAG enhances retrieval-augmented generation by evaluating retrieval quality, integrating web searches, and filtering key information, thereby improving robustness and performance across diverse tasks.

Contribution

This paper introduces CRAG, a novel framework that assesses retrieval quality, extends retrieval with web searches, and filters key information to improve RAG robustness.

Findings

01

CRAG significantly improves RAG performance on multiple datasets.

02

The retrieval evaluator effectively assesses document relevance.

03

Web search augmentation enhances retrieval quality.

Abstract

Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 5

Strengths

1. Authors propose a cost-effective method to filter, refine and improve documents in RAG. This method can be applied in various RAG pipelines. 2. Detailed experiments show the effectiveness, robustness, efficiency of CRAG.

Weaknesses

1. CRAG can not handle the multi-hop QA task. Since the confidence score is calculated between a query and a single document, the evaluator is hard to judge the complex relationships within multiple documents. 2. The filtering and refinement processes depend on the evaluator, which can potentially filter out the useful information. Besides, the paper lacks an evaluation and analysis of the filtering performance. What if the evaluator filters out helpful results? 3. It is challenging to manually

Reviewer 02Rating 3Confidence 5

Strengths

1. The paper addresses an important research question that is critical. Recognizing the imperfections in existing Retriever technologies, this paper focuses on how to mitigate these issues within the Retrieval-Augmented Generation (RAG) paradigm, thus contributing valuable insights toward developing more robust RAG systems. 2. The paper presents compelling experimental results.

Weaknesses

1. **The experiment in this paper is not sufficient, particularly in the selection of baselines.** The paper lacks comparison with highly relevant prior works that have proposed corrective strategies for Retrieval-Augmented Generation (RAG). Notably, it does not include baselines such as RARR [1] and DRAGIN [2], which are essential for contextualizing the contribution of this work within the existing literature. Including these baselines would strengthen the evaluation and provide a clearer und

Reviewer 03Rating 6Confidence 4

Strengths

1. Compared to Standard RAG, CRAG shows significant improvements, although experiments were only conducted on a 7B model. 2. Knowledge Correction is necessary as it can assess the accuracy of retrieval results and prevent irrelevant results from impacting model performance. 3. The paper includes numerous ablation studies, making the experiments overall solid.

Weaknesses

1. A core concern is latency; the introduction of the Knowledge Correction phase significantly increases delays, which CRAG does not discuss. The authors only discuss the time consumption of the generation phase, which could hinder CRAG's practical application. 2. Additionally, the accuracy of the T5-based Retrieval Evaluator is concerning. It is also unclear how the evaluator's accuracy impacts CRAG's generation. 3. The lack of case analysis is notable. It would be beneficial to understand how

Reviewer 04Rating 3Confidence 4

Strengths

1. The presentation is clear and easy to follow. 2. The paper presents comprehensive experiments across multiple datasets to show the performance of the proposed method.

Weaknesses

1. The technical contribution of this paper is limited: The proposed method brings limited technical contribution to RAG area. Components of the proposed method are similar to by existing methods in Information Retrieval or RAG. For example, assessing the quality of retrieved texts to determine whether retrieval or not has been fully studied in existing RAG methods such as Self-RAG [1] and RetRobust [2]. Using large-scale web resources to replace the static corpus cannot be seen as an technical

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsFocus