On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation

Xueru Wen; Jie Lou; Xinyu Lu; Ji Yuqiu; Xinyan Guan; Yaojie Lu; Hongyu Lin; Ben He; Xianpei Han; Debing Zhang; Le Sun

arXiv:2406.12221·cs.CL·May 27, 2025·1 cites

On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation

Xueru Wen, Jie Lou, Xinyu Lu, Ji Yuqiu, Xinyan Guan, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Debing Zhang, Le Sun

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces RLFH, an on-policy reinforcement learning method that uses fine-grained self-assessment and external knowledge to reduce hallucinations in large language models during response generation.

Contribution

The paper proposes a novel on-policy self-alignment framework with fine-grained feedback, enabling LLMs to self-correct hallucinations without human intervention.

Findings

01

RLFH significantly reduces hallucinations on multiple benchmarks.

02

Fine-grained, statement-level feedback improves model accuracy.

03

Online reinforcement learning enhances knowledge adherence.

Abstract

Hallucination occurs when large language models exhibit behavior that deviates from the boundaries of their knowledge during response generation. To address this critical issue, previous learning-based methods attempt to finetune models but are limited by off-policy sampling and coarse-grained feedback. In this paper, we present \textit{\b{R}einforcement \b{L}earning \b{f}or \b{H}allucination} (RLFH), an on-policy self-alignment approach that enables LLMs to actively explore their knowledge boundaries and self-correct generation behavior through fine-grained feedback signals. RLFH introduces a self-assessment framework where the policy serves as its own judge. Through this framework, responses are automatically decomposed into atomic facts and their truthfulness and informativeness are assessed against external knowledge sources. The resulting fine-grained feedback at the statement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AlignRM/RLFH
pytorch

Videos

On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation· underline

Taxonomy

TopicsPharmacovigilance and Adverse Drug Reactions · Mental Health Research Topics · Computational Drug Discovery Methods

MethodsFocus