GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance

Zaixi Zhang; Zhenghong Zhou; Ruofan Jin; Le Cong; Mengdi Wang

arXiv:2505.23839·cs.CR·June 2, 2025

GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance

Zaixi Zhang, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang

PDF

1 Repo 3 Reviews

TL;DR

GeneBreaker systematically evaluates vulnerabilities of DNA language models to jailbreak attacks that can generate harmful sequences, revealing significant biosecurity risks as models scale.

Contribution

This work introduces the first framework for assessing jailbreak vulnerabilities in DNA foundation models, combining bioinformatics tools, guided sequence generation, and pathogen detection pipelines.

Findings

01

GeneBreaker achieves up to 60% attack success rate on Evo models.

02

Jailbreaks produce sequences with high fidelity to pathogenic structures.

03

Scaling models increases dual-use risks, emphasizing need for safety measures.

Abstract

DNA, encoding genetic instructions for almost all living organisms, fuels groundbreaking advances in genomics and synthetic biology. Recently, DNA Foundation Models have achieved success in designing synthetic functional DNA sequences, even whole genomes, but their susceptibility to jailbreaking remains underexplored, leading to potential concern of generating harmful sequences such as pathogens or toxin-producing genes. In this paper, we introduce GeneBreaker, the first framework to systematically evaluate jailbreak vulnerabilities of DNA foundation models. GeneBreaker employs (1) an LLM agent with customized bioinformatic tools to design high-homology, non-pathogenic jailbreaking prompts, (2) beam search guided by PathoLM and log-probability heuristics to steer generation toward pathogen-like sequences, and (3) a BLAST-based evaluation pipeline against a curated Human Pathogen…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

While many works study jailbreaks in text LLMs, almost none examine biological foundation models. The proposed JailbreakDNABench gives the community a starting point to measure and compare biosafety risks. The GeneBreaker framework combines a prompt-designing LLM, a guided beam search, and a BLAST-based evaluation pipeline in a way that feels methodical and reproducible. The idea of using PathoLM as a guidance signal is clever and biologically grounded. The authors test multiple large DNA models

Weaknesses

1. The paper stops at exposing risks but provides no concrete defense or biosafety governance mechanism beyond brief veto filtering in the Appendix. 2. The benchmark seems built entirely by the authors. How can we be sure JailbreakDNABench isn’t biased toward viruses that are easier to hit, making GeneBreaker look stronger? 3. The experiments use only five trials per model. With so few runs, can we confidently say Evo2-40B is truly more vulnerable than the smaller ones? 4. While plausible, th

Reviewer 02Rating 6Confidence 3

Strengths

(i) Novelty: The paper provides the first framework to evaluate whether genomic foundation models can be jailbroken to produce sequences closely resembling regulated human pathogens. (ii) Technical completeness: LLM-assisted prompt construction + guided beam search + BLAST/VADR evaluation. (iii) Robust experimental results: On JailbreakDNABench, GeneBreaker successfully jailbreaks the latest Evo series models across 6 viral categories consistently.

Weaknesses

(i) Dataset transparency: The paper does not provide a full list of sequences/accession IDs or the dataset statistics (e.g., dataset sample size) (ii) There are two biological statements, it is a little hard for people without much biological background to understand. (iii) Wrong statements: In Table 3, GenomeOcean just uses the Mistral architecture, not an MOE model.

Reviewer 03Rating 6Confidence 3

Strengths

- Overall clear and well-structured paper - Addresses one of the most critical safety concerns surrounding generative AI applied to biology, providing the first systematic evidence of a jailbreak vulnerability in frontier DNA-LM. - Novel and effective jailbreaking framework, GeneBreaker, that combines domain-specific knowledge and modern LLM jailbreak tactics at is able to successfully jailbreaks the latest Evo series models across 6 viral categories.

Weaknesses

- The reliance on PathoLM and logP to guide chunk decoding may limit their search space. Did the authors did any study on using other methods of pathogen scoring to test the robustness of their methods. - The lack of wet lab experiments to verify their findings. It will further boost the impact of their work.

Code & Models

Repositories

zaixizhang/genebreaker
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.