pygenstrat: a Python package for EIGENSTRAT data processing
Dilek Koptekin

TL;DR
pygenstrat is a Python tool for efficiently processing EIGENSTRAT data used in ancient DNA studies, offering faster and more memory-efficient operations.
Contribution
pygenstrat introduces a memory-efficient Python package for EIGENSTRAT data processing with enhanced speed and flexibility.
Findings
pygenstrat achieves 2×–15× speedups and 90%–95% memory reduction compared to convertf on the Allen Ancient DNA Resource.
The package supports extensive EIGENSTRAT data operations including filtering, subsetting, and format conversion.
pygenstrat enables reproducible and efficient processing of large ancient DNA datasets.
Abstract
Ancient DNA studies rely heavily on the EIGENSTRAT genotype format (.geno, .ind, .snp) for standard population genetic analyses including PCA, f-statistics, and qpWave/qpAdm. However, there is limited software available for processing EIGENSTRAT format data. pygenstrat, a Python package, is presented here, providing a command-line interface for comprehensive EIGENSTRAT data processing with extensive filtering, subsetting, and conversion options. pygenstrat implements memory-efficient, chunked processing algorithms for handling large ancient DNA datasets with low memory usage. It supports comprehensive operations, including updating individual and SNP files, subsetting datasets by selecting individuals or SNPs, filtering by minor allele frequency and missingness, pseudo-haploidisation, allele polarization, as well as conversion between EIGENSTRAT (text) and ANCESTRYMAP (binary) formats.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForensic and Genetic Research · Genetic Associations and Epidemiology · Genetic diversity and population structure
