Learn to Refuse: Making Large Language Models More Controllable and   Reliable through Knowledge Scope Limitation and Refusal Mechanism

Lang Cao

arXiv:2311.01041·cs.CL·September 23, 2024·1 cites

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism

Lang Cao

PDF

Open Access 1 Video

TL;DR

This paper introduces Learn to Refuse (L2R), a method that improves LLM reliability by enabling models to refuse to answer questions outside their knowledge scope, thereby reducing hallucinations and increasing controllability.

Contribution

The paper proposes a novel refusal mechanism combined with a structured knowledge base to enhance LLM control and reliability, including an automatic knowledge base expansion method.

Findings

01

L2R reduces hallucinations in LLMs during question-answering.

02

Structured knowledge base improves answer accuracy and traceability.

03

Automatic knowledge base expansion enhances model understanding over time.

Abstract

Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsFocus · Balanced Selection