Poisoning the Genome: Targeted Backdoor Attacks on DNA Foundation Models

Charalampos Koilakos; Ioannis Mouratidis; Ilias Georgakopoulos-Soares

arXiv:2603.27465·q-bio.GN·March 31, 2026

Poisoning the Genome: Targeted Backdoor Attacks on DNA Foundation Models

Charalampos Koilakos, Ioannis Mouratidis, Ilias Georgakopoulos-Soares

PDF

TL;DR

This paper explores the vulnerability of genomic foundation models to data poisoning attacks, demonstrating how adversarial sequences and label corruption can manipulate model outputs and clinical predictions.

Contribution

It is the first systematic study showing that genomic foundation models are susceptible to targeted poisoning, highlighting security risks in genomic AI development.

Findings

01

Adversarial sequences can degrade model performance on specific genomic contexts.

02

Targeted label corruption affects clinical variant effect predictions.

03

Full backdoor activation occurs at just 1% poison exposure.

Abstract

Genomic foundation models trained on DNA sequences have demonstrated remarkable capabilities across diverse biological tasks, from variant effect prediction to genome design. These models are typically trained on massive, publicly sourced genomic datasets comprising trillions of nucleotide tokens, which renders them intrinsically susceptible to errors, artifacts, and adversarial issues embedded in the training data. Unlike natural language, DNA sequences lack the semantic transparency that might allow model makers to filter out corrupted entries, making genomic training corpora particularly susceptible to undetected manipulation. While training data poisoning has been established as a credible threat to large language models, its implications for genomic foundation models remain unexplored. Here, we present the first systematic investigation of training data poisoning in genomic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.