Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models

Asia Belfiore; Jonathan Passerat-Palmbach; Dmitrii Usynin

arXiv:2511.07503·cs.CR·December 19, 2025

Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models

Asia Belfiore, Jonathan Passerat-Palmbach, Dmitrii Usynin

PDF

Open Access 1 Video

TL;DR

This paper investigates the privacy risks of using language models to generate synthetic genetic data, introducing a novel hybrid attack that combines traditional membership inference with biological context to improve attack success.

Contribution

It presents a biologically-informed hybrid membership inference attack that enhances privacy breach capabilities on generative genomic models using differential privacy.

Findings

01

Hybrid attack outperforms traditional MIAs in success rate.

02

Small and large transformer models can generate viable synthetic genetic data.

03

Differential privacy provides some privacy guarantees but can be compromised by advanced attacks.

Abstract

The increased availability of genetic data has transformed genomics research, but raised many privacy concerns regarding its handling due to its sensitive nature. This work explores the use of language models (LMs) for the generation of synthetic genetic mutation profiles, leveraging differential privacy (DP) for the protection of sensitive genetic data. We empirically evaluate the privacy guarantees of our DP modes by introducing a novel Biologically-Informed Hybrid Membership Inference Attack (biHMIA), which combines traditional black box MIA with contextual genomics metrics for enhanced attack power. Our experiments show that both small and large transformer GPT-like models are viable synthetic variant generators for small-scale genomics, and that our hybrid attack leads, on average, to higher adversarial success compared to traditional metric-based MIAs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models· underline

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Genomics and Rare Diseases · Cancer Genomics and Diagnostics