Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism
Lang Cao

TL;DR
This paper introduces Learn to Refuse (L2R), a method that improves LLM reliability by enabling models to refuse to answer questions outside their knowledge scope, thereby reducing hallucinations and increasing controllability.
Contribution
The paper proposes a novel refusal mechanism combined with a structured knowledge base to enhance LLM control and reliability, including an automatic knowledge base expansion method.
Findings
L2R reduces hallucinations in LLMs during question-answering.
Structured knowledge base improves answer accuracy and traceability.
Automatic knowledge base expansion enhances model understanding over time.
Abstract
Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsFocus · Balanced Selection
