Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Tong Chen, Akari Asai, Luke Zettlemoyer, Hannaneh Hajishirzi, Faeze Brahman

TL;DR
This paper introduces a binary retrieval-augmented reward method for training language models, significantly reducing hallucinations while maintaining performance on various tasks by rewarding only fully factual outputs.
Contribution
The paper presents a novel binary reward scheme for reinforcement learning that effectively mitigates hallucinations without degrading open-ended generation or downstream task performance.
Findings
39.3% reduction in hallucination rates
44.4% fewer incorrect answers on PopQA
No performance degradation on instruction following, math, or code
Abstract
Language models often generate factually incorrect information unsupported by their training data, a phenomenon known as extrinsic hallucination. Existing mitigation approaches often degrade performance on open-ended generation and downstream tasks, limiting their practical utility. We propose an online reinforcement learning method using a novel binary retrieval-augmented reward (RAR) to address this tradeoff. Unlike continuous reward schemes, our approach assigns a reward of one only when the model's output is entirely factually correct, and zero otherwise. We evaluate our method on Qwen3 reasoning models across diverse tasks. For open-ended generation, binary RAR achieves a 39.3% reduction in hallucination rates, substantially outperforming both supervised training and continuous-reward RL baselines. In short-form question answering, the model learns calibrated abstention,…
Peer Reviews
Decision·Submitted to ICLR 2026
* The writing is clear, and the proposed method is straightforward. * The experiments are thorough, and the work is relatively comprehensive.
* The main difference between this work and the most relevant baseline, VeriScore, lies in the design of the reward signal. While VeriScore uses a soft reward based on the proportion of correct claims, this paper adopts a hard binary reward that simply judges whether any factual inconsistency exists. Apart from this modification, there are no substantial differences between the two approaches. Overall, this incremental improvement makes the work resemble more of a technical report than a full re
1. The method is simple and effective, easy to implement from an engineering perspective, and demonstrates greater robustness compared to dense reward models. 2. The experimental design simultaneously addresses both hallucination mitigation and preservation of general capabilities. 3. Experimental results show that the proposed approach improves the model’s factual accuracy, achieving strong overall performance.
1. The authors claim that full-text contradiction detection avoids the “error accumulation” of claim-wise verification, but this statement lacks justification. Claim-level verification is an independent process without cumulative error, whereas full-text inputs may introduce contextual interference and order bias. A controlled comparison between the two verification granularities is recommended to support this claim. 2. The “I don’t know” samples are not human-labeled but automatically detected
1. The paper replaces traditional continuous factuality scores with a binary retrieval-augmented reward, explicitly optimizing the model toward *avoiding factual errors*. This design leads to a simple yet effective mechanism for factual correction and hallucination reduction. 2. The experiments are well-structured and comprehensive, including ablation studies, multi-task and multi-model evaluations, and assessments of general capabilities. The results appear credible and consistent. 3. The paper
1. The optimization objective fully depends on retrieved evidence — the binary reward is determined by whether the retrieved documents contradict the model output. Since retrieval systems inherently contain bias (e.g., source preference, recency errors, and incomplete coverage), the model is effectively optimized to align with retrieval consensus rather than the ground truth. In essence, the model learns retrieval alignment, not truth alignment. 2. This RLKF paper (https://arxiv.org/abs/2403.183
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
