Please refuse to answer me! Mitigating Over-Refusal in Large Language Models via Adaptive Contrastive Decoding

Yupeng Qi; Ziyu Lyu; Lixin Cui; Lu Bai; Feng Xia

arXiv:2604.17132·cs.CL·April 21, 2026

Please refuse to answer me! Mitigating Over-Refusal in Large Language Models via Adaptive Contrastive Decoding

Yupeng Qi, Ziyu Lyu, Lixin Cui, Lu Bai, Feng Xia

PDF

1 Repo

TL;DR

This paper introduces AdaCD, a training-free, model-agnostic method that adaptively adjusts refusal behavior in large language models to better balance safety and usability.

Contribution

It proposes a novel adaptive contrastive decoding approach that mitigates over-refusal in LLMs without retraining, improving safety and response appropriateness.

Findings

01

Reduces refusal ratio for over-refusal queries by 10.35% on average.

02

Increases refusal ratio for malicious queries by 0.13%.

03

Works across five benchmark datasets.

Abstract

Safety-aligned large language models (LLMs) often generate refusal responses to harmless queries due to the over-refusal problem. However, existing methods for mitigating over-refusal cannot maintain a low refusal ratio for harmless queries while keeping a high refusal ratio for malicious ones. In this paper, we analyze how system prompts with varying safety levels affect LLM refusal behaviors when facing over-refusal queries. A key observation is that, when LLMs suffer from the over-refusal issue, non-refusal tokens remain present in the next-token candidate list, but the model systematically fails to select them, despite the generation of refusal tokens. Based on this observation, we propose a training-free and model-agnostic approach, Adaptive Contrastive Decoding (AdaCD), to mitigate over-refusal while maintaining LLM safety. First, AdaCD compares the output distributions of the LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

OutdoorManofML/AdaCD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.