ConfRAG: Confidence-Guided Retrieval-Augmenting Generation
Yin Huang, Yifan Ethan Xu, Kai Sun, Vera Yan, Alicia Sun, Haidar Khan, Jimmy Nguyen, Jingxiang Chen, Mohammad Kachuee, Zhaojiang Lin, Yue Liu, Aaron Colak, Anuj Kumar, Wen-tau Yih, Xin Luna Dong

TL;DR
ConfRAG combines confidence calibration with selective retrieval to significantly reduce hallucinations and retrieval costs in large language models, achieving high accuracy with fewer external calls.
Contribution
This work introduces ConfRAG, a novel confidence-guided retrieval-augmentation method that effectively minimizes hallucinations and unnecessary retrievals in LLMs.
Findings
Hallucination rates reduced from 20-40% to below 5%.
Over 30% reduction in external retrievals.
Achieves above 95% accuracy in ideal conditions.
Abstract
Can Large Language Models (LLMs) be trained to avoid hallucinating factual statements, and can Retrieval-Augmented Generation (RAG) be triggered only when necessary to reduce retrieval and computation costs? In this work, we address both challenges simultaneously. We introduce ConfQA, a fine-tuning strategy that reduces hallucination rates from 20-40% to below 5% across multiple factuality benchmarks. The approach is simple: when the model answers correctly, it is trained to output the answer; otherwise, it is trained to respond with "I am unsure". Two design choices make this training effective: (1) a dampening prompt ("answer only if you are confident") that explicitly discourages overconfident hallucinations, and (2) training data drawn from atomic factual statements (e.g., knowledge graph attribute values), which calibrates model confidence and yields robust generalization across…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper methodically addresses the RAG triggering problem with clear motivation (Figure 3 shows LLMs are overconfident) - Strong empirical results: 1. Consistent hallucination reduction across diverse benchmarks 2. Maintains or improves factuality scores 3. Practical latency improvements demonstrated - The dampening prompt and DBPedia focus are well-motivated through ablations - 7 benchmarks covering different question types and domains show generalization - Detailed prompts, implementati
- The core contribution beyond R-Tuning and IDK appears to be (1) the dampening prompt and (2) using DBPedia instead of MMLU. While effective, this feels incremental. - Training data limitations: 1. Only 3K samples seems small for teaching general confidence behavior 2. DBPedia focus on entity attributes may not transfer well to other factual question types 3. No systematic study of data diversity vs. quality trade-offs - Fine-tuning results are primarily on Llama-3.1-70B. Claims about genera
1. The motivation is clear. This work focuses important issues of LLM hallucinations and computational efficiency in RAG. 2. The use of confidence signaling (“I am unsure”) and SFT objectives makes the framework conceptually straightforward. 3. Experimental results demonstrate improvements over baselines across multiple benchmarks, notably lowering hallucination rates.
1. The primary limitation of this work is its lack of novelty. Training with the 'unknown' token is a commonly employed technique in many existing RAG systems (e.g., [1]). This study does not offer substantial new insights beyond some empirical observations. 2. Several design choices are not clearly explained. For example, the rationale behind the design of the dampener prompt and the "unsure" answer remains unclear. Is model performance highly sensitive to the choice prompt and answer? Additio
1. The paper focuses on an important question: teaching the model to recognize its own knowledge boundaries and to trigger retrieval only when it does not know the answer. 2. The paper is well written and logically coherent. 3. The paper uses the principle of “answer only if you are confident” to suppress overconfidence, and it trains on atomic facts, which leads to relatively high accuracy.
1. The paper lacks novelty — the idea of triggering retrieval only when the model does not know the answer is not new. 2. There have been many works between 2023 and 2024 that use SFT (Supervised Fine-Tuning) to enable models to express uncertainty, and this paper is not fundamentally different from those approaches. 3. The paper includes too few baselines and lacks citations to several foundational works in the areas of adaptive RAG and LLM knowledge boundary perception. [1] SAC3: Reliable Ha
The proposed finetuning method can teach LLM to refrain from generating inconfident outputs while simultaneously improving retrieval efficiency.
Clarity - The paper's central premise is to use model uncertainty as a trigger for retrieval. However, there appears to be a fundamental mismatch between this trigger and the fine-tuning objective, which is based on correctness. The paper does not sufficiently address the gap between model confidence and answer correctness. For instance, a model can be uncertain about a correct answer or highly confident in an incorrect one. This discrepancy seems to undermine the core mechanism, and it is uncle
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Image and Video Retrieval Techniques · Recommender Systems and Techniques
