KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG

Yongjian Li; HaoCheng Chu; Yukun Yan; Zhenghao Liu; Shi Yu; Zheni Zeng; Ruobing Wang; Sen Song; Zhiyuan Liu; Maosong Sun

arXiv:2506.02503·cs.CL·June 4, 2025

KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG

Yongjian Li, HaoCheng Chu, Yukun Yan, Zhenghao Liu, Shi Yu, Zheni Zeng, Ruobing Wang, Sen Song, Zhiyuan Liu, Maosong Sun

PDF

Open Access 3 Reviews

TL;DR

KARE-RAG introduces a knowledge-aware approach that refines how retrieval-augmented models handle noisy data, significantly boosting factual accuracy and robustness across various tasks with minimal additional data.

Contribution

This paper presents a novel framework with structured knowledge representations, a refined training objective, and a contrastive data pipeline to improve RAG models' handling of noisy retrieved content.

Findings

01

Enhanced performance on in-domain tasks

02

Improved out-of-domain robustness

03

Data-efficient training with modest data

Abstract

Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to access broader knowledge sources, yet factual inconsistencies persist due to noise in retrieved documents-even with advanced retrieval methods. We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance. In this paper, we present KARE-RAG (Knowledge-Aware Refinement and Enhancement for RAG), which improves knowledge utilization through three key innovations: (1) structured knowledge representations that facilitate error detection during training, (2) Dense Direct Preference Optimization (DDPO)-a refined training objective that prioritizes correction of critical errors, and (3) a contrastive data generation pipeline that maintains semantic consistency while rectifying factual inaccuracies. Experiments show our method significantly enhances…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

- This paper proposes to train an LLM with RL by optimizing its ability to extract structured knowledge for QA, which seems to be a valid direction for RAG in general. - Experiments are done on multiple datasets with different backbone models, and the results show strong performance on OOD test sets specifically - Detailed analysis and ablation studies show the effectiveness of proposed method

Weaknesses

- The baseline selection is limited. It would be better to compare with other prompt-based and especially graph-based RAG methods to show the value of knowledge refinement - Because the prompt schema is rather complicated, it would be better to report the cost and token consumption of the proposed method and how it compares with baselines - The paper lacks some real case studies to show what a good quality knowledge structure can be induced by the model and how it helps the final answering.

Reviewer 02Rating 4Confidence 4

Strengths

+ The paper proposes a practical approach that trains models to generate intermediate structured representations for optimizing RAG, effectively addressing the sparse supervision problem in end-to-end methods like DPO. + The method shows strong data efficiency and OOD generalization, suggesting that the model learns a robust strategy for knowledge organization rather than mere memorization. + The trained model incurs no additional inference overhead, making it easy to integrate into existing RAG

Weaknesses

- The refinement stage relies heavily on a more advanced expert LLM to generate high-quality positive samples. While this improves data quality, it raises concerns about novelty and practical independence, as the performance gain may largely depend on the capabilities of the external expert model. - The training primarily uses the Musique dataset, but the paper does not provide sufficient justification for this choice or explore cross-dataset generalization. It would strengthen the work to test

Reviewer 03Rating 6Confidence 3

Strengths

High data efficiency: Achieves robust performance with only a small amount of training data, addressing the traditional reliance on large-scale high-quality datasets. Strong practicality: Does not alter the standard RAG inference pipeline, incurs no additional computational overhead, and can be seamlessly integrated into existing systems. Excellent generalization and compatibility: Delivers superior OOD (out-of-distribution) performance, supports multiple structured representations (e.g., knowle

Weaknesses

- How to design specific structured representation. Heavily relies on carefully designed knowledge representation structures, whose versatility and adaptability to diverse task scenarios have not been fully verified. - Mabye limitations of automated sample generation. The sample refinement pipeline relies on advanced LLMs for error correction, which may be affected by the performance of the underlying LLMs. The sample quality in extremely complex scenarios is not explained. - Room for expansion

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · AI-based Problem Solving and Planning · Context-Aware Activity Recognition Systems