Benchmark Early and Red Team Often: A Framework for Assessing and Managing Dual-Use Hazards of AI Foundation Models
Anthony M. Barrett, Krystal Jackson, Evan R. Murphy, Nada Madkour,, Jessica Newman

TL;DR
This paper proposes a combined approach using open benchmarks and closed red team evaluations to assess and manage the dual-use risks of AI foundation models, aiming for effective, resource-aware risk mitigation.
Contribution
It introduces a framework that leverages both open and closed evaluation methods to better identify and manage dual-use hazards in AI models.
Findings
Correlation between benchmark scores and red team evaluations suggests benchmarks can predict dual-use potential.
Frequent use of open benchmarks can inform safer model development.
Red team evaluations provide detailed insights into high-risk models.
Abstract
A concern about cutting-edge or "frontier" AI foundation models is that an adversary may use the models for preparing chemical, biological, radiological, nuclear, (CBRN), cyber, or other attacks. At least two methods can identify foundation models with potential dual-use capability; each has advantages and disadvantages: A. Open benchmarks (based on openly available questions and answers), which are low-cost but accuracy-limited by the need to omit security-sensitive details; and B. Closed red team evaluations (based on private evaluation by CBRN and cyber experts), which are higher-cost but can achieve higher accuracy by incorporating sensitive details. We propose a research and risk-management approach using a combination of methods including both open benchmarks and closed red team evaluations, in a way that leverages advantages of both methods. We recommend that one or more groups…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOccupational Health and Safety Research
MethodsSparse Evolutionary Training
