GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models

Haozheng Luo; Chenghao Qiu; Yimin Wang; Shang Wu; Jiahao Yu; Zhenyu Pan; Weian Mao; Haoyang Fang; Hao Xu; Han Liu; Binghui Wang; Yan Chen

arXiv:2505.10983·cs.LG·October 14, 2025

GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models

Haozheng Luo, Chenghao Qiu, Yimin Wang, Shang Wu, Jiahao Yu, Zhenyu Pan, Weian Mao, Haoyang Fang, Hao Xu, Han Liu, Binghui Wang, Yan Chen

PDF

Open Access 1 Repo 1 Datasets 3 Reviews

TL;DR

GenoArmory introduces the first comprehensive benchmark for evaluating the vulnerability of genomic foundation models to adversarial attacks, providing insights into model robustness and biological significance of attack targets.

Contribution

It presents a unified evaluation framework and a new adversarial dataset for assessing and improving the robustness of GFMs against adversarial threats.

Findings

01

Classification models are more robust than generative models.

02

Adversarial attacks often target biologically important genomic regions.

03

The framework analyzes vulnerabilities across architectures, quantization, and datasets.

Abstract

We propose the first unified adversarial attack benchmark for Genomic Foundation Models (GFMs), named GenoArmory. Unlike existing GFM benchmarks, GenoArmory offers the first comprehensive evaluation framework to systematically assess the vulnerability of GFMs to adversarial attacks. Methodologically, we evaluate the adversarial robustness of five state-of-the-art GFMs using four widely adopted attack algorithms and three defense strategies. Importantly, our benchmark provides an accessible and comprehensive framework to analyze GFM vulnerabilities with respect to model architecture, quantization schemes, and training datasets. Additionally, we introduce GenoAdv, a new adversarial sample dataset designed to improve GFM safety. Empirically, classification models exhibit greater robustness to adversarial perturbations compared to generative models, highlighting the impact of task type on…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 2

Strengths

This large-scale benchmark is a contribution to the community. The writing is good.

Weaknesses

1. The paper thoroughly validates the effectiveness of various white-box attacks. However, the success of white-box attacks, given full access to the model, is a relatively well-established paradigm in the broader adversarial machine learning field. The current presentation focuses heavily on demonstrating this vulnerability, which, while important, might be perceived as confirming an expected outcome. The paper would be significantly strengthened by shifting the focus toward a deeper analysis o

Reviewer 02Rating 2Confidence 2

Strengths

- A complete study of the adversarial robustness of Genomic Foundation Models is clearly important to ensure better adoption and validation of these models. - The provided attack pipeline to evaluate both the attacks and defenses is very interesting and easy to be adapted and used.

Weaknesses

- I believe that Genomic data, and consequently Genomic Models have their own propriety and corresponding constraints that should be taken into account when considered adversarial constraints in this domain. The paper lacks severally a contextual formulation of these constraints to showcase how the adversarial aim for these models differs from other modalities. - In line with the previous remark, the majority of the considered and implemented attacks are simply an adaption of previously availab

Reviewer 03Rating 8Confidence 3

Strengths

1. This work is timely and useful for real-world genomic modeling. It addresses the lack of standardized evaluation in deep genomics by providing a practical tool for balancing performance and efficiency. 2. It integrates benchmarking, optimization, and interpretability within one framework.

Weaknesses

1. The performance on very large genomic datasets (e.g., full WGS data) is not extensively tested, so the scalability of GenoArmory is unclear. 2. The discussion on data shifts is limited: Model robustness under cross-cell-type or cross-species transfer is not explored.

Code & Models

Repositories

MAGICS-LAB/GenoArmory
pytorchOfficial

Datasets

magicslabnu/GenoAdv
dataset· 5 dl
5 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)