On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation
Xueru Wen, Jie Lou, Xinyu Lu, Ji Yuqiu, Xinyan Guan, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Debing Zhang, Le Sun

TL;DR
This paper introduces RLFH, an on-policy reinforcement learning method that uses fine-grained self-assessment and external knowledge to reduce hallucinations in large language models during response generation.
Contribution
The paper proposes a novel on-policy self-alignment framework with fine-grained feedback, enabling LLMs to self-correct hallucinations without human intervention.
Findings
RLFH significantly reduces hallucinations on multiple benchmarks.
Fine-grained, statement-level feedback improves model accuracy.
Online reinforcement learning enhances knowledge adherence.
Abstract
Hallucination occurs when large language models exhibit behavior that deviates from the boundaries of their knowledge during response generation. To address this critical issue, previous learning-based methods attempt to finetune models but are limited by off-policy sampling and coarse-grained feedback. In this paper, we present \textit{\b{R}einforcement \b{L}earning \b{f}or \b{H}allucination} (RLFH), an on-policy self-alignment approach that enables LLMs to actively explore their knowledge boundaries and self-correct generation behavior through fine-grained feedback signals. RLFH introduces a self-assessment framework where the policy serves as its own judge. Through this framework, responses are automatically decomposed into atomic facts and their truthfulness and informativeness are assessed against external knowledge sources. The resulting fine-grained feedback at the statement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPharmacovigilance and Adverse Drug Reactions · Mental Health Research Topics · Computational Drug Discovery Methods
MethodsFocus
