Rejection Improves Reliability: Training LLMs to Refuse Unknown   Questions Using RL from Knowledge Feedback

Hongshen Xu; Zichen Zhu; Situo Zhang; Da Ma; Shuai Fan; Lu Chen; Kai; Yu

arXiv:2403.18349·cs.CL·August 9, 2024·2 cites

Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback

Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen, Kai, Yu

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning framework that trains large language models to reject questions beyond their knowledge, thereby reducing hallucinations and improving response reliability.

Contribution

It proposes a novel RL-based method called RLKF that dynamically identifies knowledge boundaries and trains models to refuse out-of-knowledge questions.

Findings

01

RLKF significantly reduces hallucinations in LLMs.

02

Models trained with RLKF show improved accuracy on mathematical questions.

03

Rejection mechanisms enhance overall model reliability.

Abstract

Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope. While addressing hallucination has been a focal point in research, previous efforts primarily concentrate on enhancing correctness without giving due consideration to the significance of rejection mechanisms. In this paper, we conduct a comprehensive examination of the role of rejection, introducing the notion of model reliability along with corresponding metrics. These metrics measure the model's ability to provide accurate responses while adeptly rejecting questions exceeding its knowledge boundaries, thereby minimizing hallucinations. To improve the inherent reliability of LLMs, we present a novel alignment framework called Reinforcement Learning from Knowledge Feedback (RLKF). RLKF leverages knowledge feedback to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling