SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

Xiangyang Zhu; Yuan Tian; Qi Jia; Kaiwei Zhang; Zicheng Zhang; Chunyi Li; Kaiyuan Ji; Dongrui Liu; Zijian Chen; Lu Sun; Renrui Zhang; Yan Teng; Jing Shao; Wei Sun; Xia Hu; Yu Qiao; Guangtao Zhai

arXiv:2603.01589·cs.LG·April 6, 2026

SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

Xiangyang Zhu, Yuan Tian, Qi Jia, Kaiwei Zhang, Zicheng Zhang, Chunyi Li, Kaiyuan Ji, Dongrui Liu, Zijian Chen, Lu Sun, Renrui Zhang, Yan Teng, Jing Shao, Wei Sun, Xia Hu, Yu Qiao, Guangtao Zhai

PDF

3 Models 1 Datasets

TL;DR

SafeSci introduces a comprehensive safety evaluation framework for scientific large language models, addressing limitations of existing benchmarks and providing tools for safety enhancement.

Contribution

It presents SafeSciBench and SafeSciTrain datasets, objective evaluation metrics, and demonstrates fine-tuning improves safety alignment in scientific LLMs.

Findings

01

Evaluation reveals critical safety vulnerabilities in 24 advanced LLMs.

02

Models show varying refusal behaviors on safety-related questions.

03

Fine-tuning on SafeSciTrain improves safety alignment.

Abstract

The success of large language models (LLMs) in scientific domains has heightened safety concerns, prompting numerous benchmarks to evaluate their scientific safety. Existing benchmarks often suffer from limited risk coverage and a reliance on subjective evaluation. To address these problems, we introduce SafeSci, a comprehensive framework for safety evaluation and enhancement in scientific contexts. SafeSci comprises SafeSciBench, a multi-disciplinary benchmark with 0.25M samples, and SafeSciTrain, a large-scale dataset containing 1.5M samples for safety enhancement. SafeSciBench distinguishes between safety knowledge and risk to cover extensive scopes and employs objective metrics such as deterministically answerable questions to mitigate evaluation bias. We evaluate 24 advanced LLMs, revealing critical vulnerabilities in current models. We also observe that LLMs exhibit varying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

yyy127/SafeSci
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.