# PISAD: reference-free intraspecies sample anomalies detection tool based on k-mer counting

**Authors:** Zhantian Xu, Fan Nie, Jianxin Wang

PMC · DOI: 10.1093/gigascience/giaf061 · GigaScience · 2025-06-17

## TL;DR

PISAD is a new tool that detects sample swaps in genomic data without needing a reference genome, making it useful for a wide range of species.

## Contribution

PISAD introduces a reference-free method for sample identity validation using k-mer counting and SNP concordance.

## Key findings

- PISAD requires only 0.5× data coverage, lower than reference-based tools.
- It works across multiple diploid species including humans, cattle, and nonmodel organisms.

## Abstract

Genomic sequencing research often requires the simultaneous analysis of heterogeneous data types across single or multiple individuals, introducing a substantial risk of sample swaps (e.g., labeling errors). Existing methods primarily rely on reference information, requiring the preselection of informative variant sites with a population allele frequency around 0.5, which may be insufficient or unavailable for nonmodel organisms. As research expands to encompass a growing number of new species, a robust quality control tool will become increasingly important.

We developed PISAD (Phased Intraspecies Sample Anomalies Detection), a tool for validating sample identities in whole-genome sequencing (WGS) data without requiring reference information. It uses a 2-stage approach: first, it performs rapid, reference-free single nucleotide polymorphism (SNP) calling on low-error-rate data from the target individual to create a variant sketch; then, it assesses the concordance of other samples on this sketch to verify relationships. We assessed the performance and efficiency of PISAD on Homo sapiens, Bos taurus, Gallus gallus, Arctia plantaginis, and Pyrus species.

Our evaluation showed that PISAD achieves a lower data coverage requirement (0.5×) compared to the reference-based tool ntsm and is broadly applicable to multiple diploid species.

## Linked entities

- **Species:** Homo sapiens (taxon 9606), Bos taurus (taxon 9913), Gallus gallus (taxon 9031), Arctia plantaginis (taxon 874455), Pyrus (taxon 3766)

## Full-text entities

- **Species:** Bos taurus (bovine, species) [taxon 9913], Pyrus (pears, genus) [taxon 3766], Homo sapiens (human, species) [taxon 9606], Arctia plantaginis (species) [taxon 874455], Gallus gallus (bantam, species) [taxon 9031]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12202988/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12202988/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC12202988/full.md

---
Source: https://tomesphere.com/paper/PMC12202988