# Methods for safely sharing dual-use genetic data

**Authors:** Sterling Sawaya, Chien-Chi Lo, Po-E Li, Blake Hovde, Patrick Chain

PMC · DOI: 10.3389/fmicb.2026.1716431 · Frontiers in Microbiology · 2026-02-11

## TL;DR

This paper introduces methods to securely share genetic data by obscuring sensitive details, preventing malicious use while preserving broad genomic insights.

## Contribution

A novel method for obfuscating raw sequence data by pooling reads to prevent reconstruction of individual samples.

## Key findings

- Pooling reads from multiple samples prevents full reconstruction of any individual sample.
- Genomic information remains usable at a broad scale while fine-scale details are obscured.
- Regions of a genome can be selectively removed to further restrict access to sensitive data.

## Abstract

Some genetic data has dual-use potential. Sharing pathogen data has shown tremendous value. For example therapeutic development and lineage tracking during the COVID pandemic. This data sharing is complicated by the fact that these data have the potential to be used for harm. The genome sequence of a pathogen can be used to enable malicious genetic engineering approaches or to recreate the pathogen from synthetic DNA. Standard data security methods can be applied to genetic data, but when data is shared between institutions, ensuring appropriate security can be difficult. Sensitive data that is shared internationally among a wide array of institutions can be especially difficult to control. Methods for securely storing and sharing genetic data with potential for dual-use are needed to mitigate this potential harm.

Here we propose new methods that allow genetic data to be shared in a data format that prevents a nefarious actor from accessing sensitive aspects of the data. Our methods obfuscate raw sequence data by pooling reads from different samples. This approach can ensure that data is secure while stored and during electronic transfer. We demonstrate that by pooling raw sequence data from multiple samples of the same organism, the ability to fully reconstruct any individual sample is prevented. In the pooled data, most genomic information remains, but reads or mutations cannot be directly attributed to any individual sample. To further restrict access to information, regions of a genome can be removed from the reads.

Our methods obscure genomic information within raw sequence reads. This method can allow genetic data to be stored and shared while preventing a nefarious actor from being able to perfectly reconstruct an organism. Broad-scale sequence information remains, while fine scale details about specific samples are difficult or impossible to reconstruct. Our software is available at https://github.com/Geneinfosec-Inc/ReadMixer.

## Full-text entities

- **Diseases:** C-CL (MESH:D002971), Influenza (MESH:D007251), COVID (MESH:D000086382), infectious disease (MESH:D003141)
- **Species:** Homo sapiens (human, species) [taxon 9606], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Bacillus anthracis (anthrax bacterium, species) [taxon 1392], Monkeypox virus (no rank) [taxon 10244]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12932512/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12932512/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12932512/full.md

---
Source: https://tomesphere.com/paper/PMC12932512