Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models II: Benchmark Generation Process
Gary Ackerman, Zachary Kallenborn, Anna Wetzel, Hayley Peterson, Jenna LaTourette, Olivia Shoemaker, Brandon Behlendorf, Sheriff Almakki, Doug Clifford, Noah Sheinbaum

TL;DR
This paper presents a detailed process for generating a specialized biosecurity benchmark dataset for evaluating large language models' biothreat risks, combining prompt generation, red teaming, and corpus mining.
Contribution
It introduces a comprehensive framework for creating a biosecurity benchmark dataset, including novel methods for ensuring relevance, diagnosticity, and quality control.
Findings
Generated over 7,000 potential benchmarks.
Reduced to 1,010 high-quality benchmarks.
Ensured benchmarks are diagnostic and relevant to biosecurity.
Abstract
The potential for rapidly-evolving frontier artificial intelligence (AI) models, especially large language models (LLMs), to facilitate bioterrorism or access to biological weapons has generated significant policy, academic, and public concern. Both model developers and policymakers seek to quantify and mitigate any risk, with an important element of such efforts being the development of model benchmarks that can assess the biosecurity risk posed by a particular model. This paper, the second in a series of three, describes the second component of a novel Biothreat Benchmark Generation (BBG) framework: the generation of the Bacterial Biothreat Benchmark (B3) dataset. The development process involved three complementary approaches: 1) web-based prompt generation, 2) red teaming, and 3) mining existing benchmark corpora, to generate over 7,000 potential benchmarks linked to the Task-Query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacillus and Francisella bacterial research · vaccines and immunoinformatics approaches · Zoonotic diseases and public health
