# pygenstrat: a Python package for EIGENSTRAT data processing

**Authors:** Dilek Koptekin

PMC · DOI: 10.1093/bioadv/vbag022 · 2026-01-23

## TL;DR

pygenstrat is a Python tool for efficiently processing EIGENSTRAT data used in ancient DNA studies, offering faster and more memory-efficient operations.

## Contribution

pygenstrat introduces a memory-efficient Python package for EIGENSTRAT data processing with enhanced speed and flexibility.

## Key findings

- pygenstrat achieves 2×–15× speedups and 90%–95% memory reduction compared to convertf on the Allen Ancient DNA Resource.
- The package supports extensive EIGENSTRAT data operations including filtering, subsetting, and format conversion.
- pygenstrat enables reproducible and efficient processing of large ancient DNA datasets.

## Abstract

Ancient DNA studies rely heavily on the EIGENSTRAT genotype format (.geno, .ind, .snp) for standard population genetic analyses including PCA, f-statistics, and qpWave/qpAdm. However, there is limited software available for processing EIGENSTRAT format data. pygenstrat, a Python package, is presented here, providing a command-line interface for comprehensive EIGENSTRAT data processing with extensive filtering, subsetting, and conversion options. pygenstrat implements memory-efficient, chunked processing algorithms for handling large ancient DNA datasets with low memory usage. It supports comprehensive operations, including updating individual and SNP files, subsetting datasets by selecting individuals or SNPs, filtering by minor allele frequency and missingness, pseudo-haploidisation, allele polarization, as well as conversion between EIGENSTRAT (text) and ANCESTRYMAP (binary) formats. Its modular architecture and Python implementation enable rapid integration with custom pipelines and future extensions.

Benchmarking on the Allen Ancient DNA Resource (v 62.0) shows 2×–15× speedups and 90%–95% memory reduction compared to convertf, while producing equivalent outputs for standard operations. These improvements reduce turnaround time in ancient DNA workflows and facilitate reproducible processing.

pygenstrat is open-source, available at https://github.com/dkoptekin/pygenstrat.

## Full-text entities

- **Chemicals:** pygenstrat (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12895063/full.md

---
Source: https://tomesphere.com/paper/PMC12895063