Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement
Ying Zhang, Congyu Qiao, Xin Geng, Ning Xu

TL;DR
This paper introduces LANCE, a method that uses label enhancement and variational inference to make LLM refusals more flexible and natural without compromising safety.
Contribution
LANCE is a novel approach that predicts a distribution over rejection categories to enable nuanced, safe responses, reducing rigid rejections in LLMs.
Findings
LANCE significantly reduces rigid rejection incidents.
LANCE maintains high safety standards while improving response naturalness.
LANCE outperforms baseline models in helpfulness and naturalness.
Abstract
Large Language Models (LLMs) rely on safety alignment to obey safe requests while refusing harmful ones. However, traditional refusal mechanisms often lead to "rigid rejection," where a general template (e.g., "I cannot fulfill this request") indiscriminately triggers refusals and severely undermines the naturalness of interactions between humans and LLMs. To address this issue, LANCE is proposed in this paper to ensure safe yet flexible and natural responses via label enhancement. Specifically, LANCE employs variational inference to perform label enhancement, predicting a continuous distribution across multiple rejection categories. These fine-grained rejection distributions provide multi-way textual gradients for a refinement model to neutralize the hazardous elements in the prompt, so that the LLMs could generate safe responses that avoid rigid rejections while preserving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
