# DAPCy: a Python package for the discriminant analysis of principal components method for population genetic analyses

**Authors:** Alejandro Correa Rojo, Pieter Moris, Hanne Meuwissen, Pieter Monsieurs, Dirk Valkenborg

PMC · DOI: 10.1093/bioadv/vbaf143 · Bioinformatics Advances · 2025-06-18

## TL;DR

DAPCy is a Python package that improves the efficiency of analyzing genetic population structures using a method called discriminant analysis of principal components.

## Contribution

DAPCy introduces a scalable and efficient Python implementation of the discriminant analysis of principal components method for population genetics.

## Key findings

- DAPCy processes large genomic datasets faster and with less memory than the original R implementation.
- The package includes tools for genetic clustering, cross-validation, and visualization.
- Benchmarking on datasets like Plasmodium falciparum and 1000 Genomes shows improved performance.

## Abstract

The Discriminant Analysis of Principal Components method is a pivotal tool in population genetics, combining principal component analysis and linear discriminant analysis to assess the genetic structure of populations using genetic markers, focusing on the description of variation between genetic clusters. Despite its utility, the original R implementation in the adegenet package faces computational challenges with large genomic datasets. To address these limitations, we introduce DAPCy, a Python package leveraging the scikit-learn library to enhance the method’s scalability and efficiency. DAPCy supports large datasets by utilizing compressed sparse matrices and truncated singular value decomposition for dimensionality reduction, coupled with training-test cross-validation for robust model evaluation. It also includes modules for de novo genetic clustering and extensive visualization and reporting capabilities. Compared to the original R implementation, DAPCy can process genomic datasets with thousands of samples and features in less computational time and with reduced memory usage. To show DAPCy’s computational capabilities, we benchmarked it with the R implementation using the Plasmodium falciparum dataset from MalariaGEN and the 1000 Genomes Project.

DAPCy can be installed as a Python package through pip. Source code is available on https://gitlab.com/uhasselt-bioinfo/dapcy. Documentation and a tutorial can be found on https://uhasselt-bioinfo.gitlab.io/dapcy/.

## Linked entities

- **Species:** Plasmodium falciparum (taxon 5833)

## Full-text entities

- **Species:** Plasmodium falciparum (malaria parasite P. falciparum, species) [taxon 5833]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12237503/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12237503/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/PMC12237503/full.md

---
Source: https://tomesphere.com/paper/PMC12237503