LLMs can learn self-restraint through iterative self-reflection
Alexandre Pich\'e, Aristides Milios, Dzmitry Bahdanau, Chris Pal

TL;DR
This paper introduces a self-reflection based iterative process called ReSearch that enables large language models to learn self-restraint, reducing hallucinations and improving safety by selectively abstaining based on confidence levels.
Contribution
It proposes a novel self-reflection training method that teaches LLMs to modulate responses based on uncertainty, enhancing safety without extra inference costs.
Findings
Models generate fewer hallucinations on known and unknown topics.
Self-restraint improves safety by enabling abstention when uncertain.
ReSearch effectively incorporates abstention into model responses.
Abstract
In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of "self-reflection" consisting of iterative self-prompting and self-evaluation. We use the ReSearch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare
