TL;DR
GeneBreaker systematically evaluates vulnerabilities of DNA language models to jailbreak attacks that can generate harmful sequences, revealing significant biosecurity risks as models scale.
Contribution
This work introduces the first framework for assessing jailbreak vulnerabilities in DNA foundation models, combining bioinformatics tools, guided sequence generation, and pathogen detection pipelines.
Findings
GeneBreaker achieves up to 60% attack success rate on Evo models.
Jailbreaks produce sequences with high fidelity to pathogenic structures.
Scaling models increases dual-use risks, emphasizing need for safety measures.
Abstract
DNA, encoding genetic instructions for almost all living organisms, fuels groundbreaking advances in genomics and synthetic biology. Recently, DNA Foundation Models have achieved success in designing synthetic functional DNA sequences, even whole genomes, but their susceptibility to jailbreaking remains underexplored, leading to potential concern of generating harmful sequences such as pathogens or toxin-producing genes. In this paper, we introduce GeneBreaker, the first framework to systematically evaluate jailbreak vulnerabilities of DNA foundation models. GeneBreaker employs (1) an LLM agent with customized bioinformatic tools to design high-homology, non-pathogenic jailbreaking prompts, (2) beam search guided by PathoLM and log-probability heuristics to steer generation toward pathogen-like sequences, and (3) a BLAST-based evaluation pipeline against a curated Human Pathogen…
Peer Reviews
Decision·ICLR 2026 Poster
While many works study jailbreaks in text LLMs, almost none examine biological foundation models. The proposed JailbreakDNABench gives the community a starting point to measure and compare biosafety risks. The GeneBreaker framework combines a prompt-designing LLM, a guided beam search, and a BLAST-based evaluation pipeline in a way that feels methodical and reproducible. The idea of using PathoLM as a guidance signal is clever and biologically grounded. The authors test multiple large DNA models
1. The paper stops at exposing risks but provides no concrete defense or biosafety governance mechanism beyond brief veto filtering in the Appendix. 2. The benchmark seems built entirely by the authors. How can we be sure JailbreakDNABench isn’t biased toward viruses that are easier to hit, making GeneBreaker look stronger? 3. The experiments use only five trials per model. With so few runs, can we confidently say Evo2-40B is truly more vulnerable than the smaller ones? 4. While plausible, th
(i) Novelty: The paper provides the first framework to evaluate whether genomic foundation models can be jailbroken to produce sequences closely resembling regulated human pathogens. (ii) Technical completeness: LLM-assisted prompt construction + guided beam search + BLAST/VADR evaluation. (iii) Robust experimental results: On JailbreakDNABench, GeneBreaker successfully jailbreaks the latest Evo series models across 6 viral categories consistently.
(i) Dataset transparency: The paper does not provide a full list of sequences/accession IDs or the dataset statistics (e.g., dataset sample size) (ii) There are two biological statements, it is a little hard for people without much biological background to understand. (iii) Wrong statements: In Table 3, GenomeOcean just uses the Mistral architecture, not an MOE model.
- Overall clear and well-structured paper - Addresses one of the most critical safety concerns surrounding generative AI applied to biology, providing the first systematic evidence of a jailbreak vulnerability in frontier DNA-LM. - Novel and effective jailbreaking framework, GeneBreaker, that combines domain-specific knowledge and modern LLM jailbreak tactics at is able to successfully jailbreaks the latest Evo series models across 6 viral categories.
- The reliance on PathoLM and logP to guide chunk decoding may limit their search space. Did the authors did any study on using other methods of pathogen scoring to test the robustness of their methods. - The lack of wet lab experiments to verify their findings. It will further boost the impact of their work.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
