Securing Dual-Use Pathogen Data of Concern
Doni Bloomfield, Allison Berke, Moritz S. Hanke, Aaron Maiwald, James R. M. Black, Toby Webster, Tina Hernandez-Boussard, Oliver M. Crook, Jassi Pannu

TL;DR
This paper proposes a five-tier Biosecurity Data Level framework to categorize pathogen data and suggests technical and governance measures to prevent AI misuse in biological research, enhancing biosecurity.
Contribution
It introduces a novel five-tier BDL framework for classifying pathogen data and outlines corresponding technical restrictions and governance strategies.
Findings
The BDL framework categorizes pathogen data by biosecurity risk.
Proposes specific technical restrictions for each BDL tier.
Suggests governance strategies for dual-use pathogen data.
Abstract
Training data is an essential input into creating competent artificial intelligence (AI) models. AI models for biology are trained on large volumes of data, including data related to biological sequences, structures, images, and functions. The type of data used to train a model is intimately tied to the capabilities it ultimately possesses--including those of biosecurity concern. For this reason, an international group of more than 100 researchers at the recent 50th anniversary Asilomar Conference endorsed data controls to prevent the use of AI for harmful applications such as bioweapons development. To help design such controls, we introduce a five-tier Biosecurity Data Level (BDL) framework for categorizing pathogen data. Each level contains specific data types, based on their expected ability to contribute to capabilities of concern when used to train AI models. For each BDL tier, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacillus and Francisella bacterial research · Zoonotic diseases and public health · vaccines and immunoinformatics approaches
