SoSBench: Benchmarking Safety Alignment on Six Scientific Domains
Fengqing Jiang, Fengbo Ma, Zhangchen Xu, Yuetai Li, Zixin Rao, Bhaskar Ramasubramanian, Luyao Niu, Bo Li, Xianyan Chen, Zhen Xiang, Radha Poovendran

TL;DR
This paper introduces SoSBench, a comprehensive safety benchmark for large language models across six scientific domains, revealing significant safety risks and deficiencies in current models' ability to avoid harmful content.
Contribution
The paper presents SoSBench, a novel, regulation-grounded benchmark with 3,000 prompts in high-risk scientific areas, systematically expanded to evaluate safety in knowledge-intensive scenarios.
Findings
Advanced models frequently disclose harmful content despite safety claims.
Models show high rates of policy violations, e.g., 84.9% for Deepseek-R1.
Safety deficiencies are widespread across all evaluated domains.
Abstract
Large language models (LLMs) exhibit advancing capabilities in complex tasks, such as reasoning and graduate-level question answering, yet their resilience against misuse, particularly involving scientifically sophisticated risks, remains underexplored. Existing safety benchmarks typically focus either on instructions requiring minimal knowledge comprehension (e.g., ``tell me how to build a bomb") or utilize prompts that are relatively low-risk (e.g., multiple-choice or classification tasks about hazardous content). Consequently, they fail to adequately assess model safety when handling knowledge-intensive, hazardous scenarios. To address this critical gap, we introduce SoSBench, a regulation-grounded, hazard-focused benchmark encompassing six high-risk scientific domains: chemistry, biology, medicine, pharmacology, physics, and psychology. The benchmark comprises 3,000 prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
