ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain
Haochen Zhao, Xiangru Tang, Ziran Yang, Xiao Han, Xuanzhi Feng,, Yueqing Fan, Senhao Cheng, Di Jin, Yilun Zhao, Arman Cohan, Mark Gerstein

TL;DR
ChemSafetyBench is a comprehensive benchmark designed to evaluate the safety and accuracy of large language models in chemistry, addressing critical vulnerabilities and promoting safer AI development in scientific research.
Contribution
Introduces ChemSafetyBench, a new dataset and evaluation framework for assessing LLM safety and accuracy in chemistry-related tasks, with over 30,000 samples and diverse scenarios.
Findings
State-of-the-art LLMs show strengths in chemical queries.
Critical safety vulnerabilities are identified in current models.
Benchmark promotes development of safer chemistry AI tools.
Abstract
The advancement and extensive application of large language models (LLMs) have been remarkable, including their use in scientific research assistance. However, these models often generate scientifically incorrect or unsafe responses, and in some cases, they may encourage users to engage in dangerous behavior. To address this issue in the field of chemistry, we introduce ChemSafetyBench, a benchmark designed to evaluate the accuracy and safety of LLM responses. ChemSafetyBench encompasses three key tasks: querying chemical properties, assessing the legality of chemical uses, and describing synthesis methods, each requiring increasingly deeper chemical knowledge. Our dataset has more than 30K samples across various chemical materials. We incorporate handcrafted templates and advanced jailbreaking scenarios to enhance task diversity. Our automated evaluation framework thoroughly assesses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
