RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

Lukas Weidener; Marko Brki\'c; Mihailo Jovanovi\'c; Emre Ulgac; Aakaash Meduri

arXiv:2605.21545·cs.SE·May 22, 2026

RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

Lukas Weidener, Marko Brki\'c, Mihailo Jovanovi\'c, Emre Ulgac, Aakaash Meduri

PDF

TL;DR

RefusalBench is a new benchmark for evaluating how frontier large language models refuse biological research prompts, revealing significant variability and calibration issues in their refusal behavior.

Contribution

This paper introduces RefusalBench, a matched-triple benchmark with 141 prompts across risk tiers, enabling robust comparison of model refusal behavior in biological research contexts.

Findings

01

Refusal rates vary widely from 0.1% to 94.6% across models.

02

Provider identity influences refusal behavior more than jurisdiction.

03

Refusal calibration does not reliably indicate safety or dual-use detection.

Abstract

Frontier large language models are increasingly deployed as orchestration backbones for biological research workflows, yet no shared evidence base exists for comparing their refusal behaviour on legitimate research prompts. RefusalBench, introduced here, is a matched-triple benchmark of 141 prompts in 47 bundles that holds task framing constant while varying only biological risk tier (benign, borderline, dual-use), enabling tier-conditioned comparisons robust to subdomain confounding. A 15-prompt should-refuse positive-control module establishes per-model calibration floors; three models fail to refuse even these prompts. Across 19 frontier models in the May 2026 snapshot, strict refusal rates span 0.1% to 94.6% on identical prompts. Jurisdiction does not predict refusal in this snapshot (Mann-Whitney U, p = 0.393; EU n = 1, US bimodal); provider identity does, with Anthropic's API…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.