GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation
Runchuan Zhu, Zinco Jiang, Jiang Wu, Zhipeng Ma, Jiahe Song, Fengshuo, Bai, Dahua Lin, Lijun Wu, Conghui He

TL;DR
GRAIT is a novel instruction tuning framework that uses gradient-driven techniques to improve LLMs' ability to refuse inappropriate responses, reducing hallucinations while maintaining helpfulness.
Contribution
This paper introduces GRAIT, a gradient-based refusal-aware instruction tuning method that balances hallucination mitigation and helpfulness in LLMs.
Findings
GRAIT significantly outperforms existing methods in refusal accuracy.
GRAIT reduces hallucinations effectively in question answering tasks.
The method maintains high response usefulness while rejecting unknown questions.
Abstract
Refusal-Aware Instruction Tuning (RAIT) aims to enhance Large Language Models (LLMs) by improving their ability to refuse responses to questions beyond their knowledge, thereby reducing hallucinations and improving reliability. Effective RAIT must address two key challenges: firstly, effectively reject unknown questions to minimize hallucinations; secondly, avoid over-refusal to ensure questions that can be correctly answered are not rejected, thereby maintain the helpfulness of LLM outputs. In this paper, we address the two challenges by deriving insightful observations from the gradient-based perspective, and proposing the Gradient-driven Refusal Aware Instruction Tuning Framework GRAIT: (1) employs gradient-driven sample selection to effectively minimize hallucinations and (2) introduces an adaptive weighting mechanism during fine-tuning to reduce the risk of over-refusal, achieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTreatment of Major Depression · Schizophrenia research and treatment · Hallucinations in medical conditions
