Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context
Zhihao Zhang, Liting Huang, Guanghao Wu, Preslav Nakov, Heng Ji, Usman Naseem

TL;DR
This paper introduces Health-ORSC-Bench, a comprehensive benchmark to evaluate healthcare language models on over-refusal and safe completion, revealing challenges in balancing safety and helpfulness across different model sizes and types.
Contribution
It presents the first large-scale benchmark for measuring over-refusal and safe completion in healthcare LLMs, with an automated pipeline and human validation, and evaluates 30 models revealing calibration challenges.
Findings
Larger models tend to be more safety-pessimistic and over-refuse benign prompts.
Safety-optimized models often refuse up to 80% of hard benign prompts.
Model size and family significantly influence safety and utility balance.
Abstract
Safety alignment in Large Language Models is critical for healthcare; however, reliance on binary refusal boundaries often results in \emph{over-refusal} of benign queries or \emph{unsafe compliance} with harmful ones. While existing benchmarks measure these extremes, they fail to evaluate Safe Completion: the model's ability to maximise helpfulness on dual-use or borderline queries by providing safe, high-level guidance without crossing into actionable harm. We introduce \textbf{Health-ORSC-Bench}, the first large-scale benchmark designed to systematically measure \textbf{Over-Refusal} and \textbf{Safe Completion} quality in healthcare. Comprising 31,920 benign boundary prompts across seven health categories (e.g., self-harm, medical misinformation), our framework uses an automated pipeline with human validation to test models at varying levels of intent ambiguity. We evaluate 30…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
