Quantifying CBRN Risk in Frontier Models

Divyanshu Kumar; Nitin Aravind Birur; Tanay Baswa; Sahil Agarwal; Prashanth Harshangi

arXiv:2510.21133·cs.CR·October 27, 2025

Quantifying CBRN Risk in Frontier Models

Divyanshu Kumar, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi

PDF

TL;DR

This paper evaluates the safety vulnerabilities of leading commercial large language models in handling CBRN-related prompts, revealing significant weaknesses in current safety measures and emphasizing the need for improved alignment and evaluation standards.

Contribution

It provides the first comprehensive assessment of LLMs' risks related to CBRN information using a novel dataset and attack methodology, exposing critical safety gaps.

Findings

01

Deep Inception attacks succeed 86% of the time

02

Model safety varies widely from 2% to 96% attack success

03

Eight models are over 70% vulnerable to dangerous prompt modifications

Abstract

Frontier Large Language Models (LLMs) pose unprecedented dual-use risks through the potential proliferation of chemical, biological, radiological, and nuclear (CBRN) weapons knowledge. We present the first comprehensive evaluation of 10 leading commercial LLMs against both a novel 200-prompt CBRN dataset and a 180-prompt subset of the FORTRESS benchmark, using a rigorous three-tier attack methodology. Our findings expose critical safety vulnerabilities: Deep Inception attacks achieve 86.0\% success versus 33.8\% for direct requests, demonstrating superficial filtering mechanisms; Model safety performance varies dramatically from 2\% (claude-opus-4) to 96\% (mistral-small-latest) attack success rates; and eight models exceed 70\% vulnerability when asked to enhance dangerous material properties. We identify fundamental brittleness in current safety alignment, where simple prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.