SoSBench: Benchmarking Safety Alignment on Six Scientific Domains

Fengqing Jiang; Fengbo Ma; Zhangchen Xu; Yuetai Li; Zixin Rao; Bhaskar Ramasubramanian; Luyao Niu; Bo Li; Xianyan Chen; Zhen Xiang; Radha Poovendran

arXiv:2505.21605·cs.LG·April 7, 2026

SoSBench: Benchmarking Safety Alignment on Six Scientific Domains

Fengqing Jiang, Fengbo Ma, Zhangchen Xu, Yuetai Li, Zixin Rao, Bhaskar Ramasubramanian, Luyao Niu, Bo Li, Xianyan Chen, Zhen Xiang, Radha Poovendran

PDF

1 Datasets 1 Video

TL;DR

This paper introduces SoSBench, a comprehensive safety benchmark for large language models across six scientific domains, revealing significant safety risks and deficiencies in current models' ability to avoid harmful content.

Contribution

The paper presents SoSBench, a novel, regulation-grounded benchmark with 3,000 prompts in high-risk scientific areas, systematically expanded to evaluate safety in knowledge-intensive scenarios.

Findings

01

Advanced models frequently disclose harmful content despite safety claims.

02

Models show high rates of policy violations, e.g., 84.9% for Deepseek-R1.

03

Safety deficiencies are widespread across all evaluated domains.

Abstract

Large language models (LLMs) exhibit advancing capabilities in complex tasks, such as reasoning and graduate-level question answering, yet their resilience against misuse, particularly involving scientifically sophisticated risks, remains underexplored. Existing safety benchmarks typically focus either on instructions requiring minimal knowledge comprehension (e.g., ``tell me how to build a bomb") or utilize prompts that are relatively low-risk (e.g., multiple-choice or classification tasks about hazardous content). Consequently, they fail to adequately assess model safety when handling knowledge-intensive, hazardous scenarios. To address this critical gap, we introduce SoSBench, a regulation-grounded, hazard-focused benchmark encompassing six high-risk scientific domains: chemistry, biology, medicine, pharmacology, physics, and psychology. The benchmark comprises 3,000 prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

SOSBench/SOSBench
dataset· 894 dl
894 dl

Videos

SoSBench: Benchmarking Safety Alignment on Six Scientific Domains· slideslive