# AlleleMiner: a long-read pipeline for gene-wise de novo allele phasing and variant detection in diploid citrus cultivars

**Authors:** Yukinari Kiryu, Yoshihiro Kawahara, Tomoko Endo, Tokumasa Horiike, Kenta Shirasawa, Sachiko Isobe, Takehiko Shimada, Hiroshi Fujii

PMC · DOI: 10.1093/dnares/dsag004 · DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes · 2026-03-03

## TL;DR

AlleleMiner is a new pipeline that phases alleles in diploid citrus crops using long-read data, enabling more accurate detection of genetic variation.

## Contribution

AlleleMiner introduces a reference-minimized approach for gene-wise de novo allele phasing using PacBio HiFi reads in diploid crops.

## Key findings

- AlleleMiner achieved 91.5% phasing output for 1,409 single-copy genes across 18 citrus cultivars.
- A HiFi depth of ~30× is optimal for stable recovery of heterozygous alleles.
- Validation showed accurate allele transmission patterns and 70% complete-match allele reconstruction using simulated data.

## Abstract

Allelic variation is a critical determinant of agronomic traits in heterozygous crops. Most existing approaches define variation as reference-anchored differences, such as SNPs or structural variants, confining allelic diversity to variant feature coordinates. Here, we present AlleleMiner, a Python-based pipeline that phases diploid gene sequences directly from PacBio HiFi reads. Rather than relying on reference-based coordinate systems for allele representation, AlleleMiner uses the reference genome solely to identify target gene region sequences and performs de novo assembly of read sets at each locus, minimizing reference dependence and reconstructing phased allele sequences. Across 18 citrus cultivars, the pipeline achieved an average phasing output of 91.5% of 1,409 single-copy genes, with coverage achieving. Coverage analyses using both real and simulated datasets indicated that a ∼30× HiFi depth is preferable for the stable recovery of heterozygous alleles, reducing potential allele dropout. Validation using pedigree information showed allele transmission patterns with known relationships. Using simulated haplotype data and the Citrus clementina assembly v1.0, AlleleMiner achieved complete-match reconstruction for both alleles at approximately 70% of loci. By enabling reference-minimized gene-level allele discovery, AlleleMiner provides a scalable framework for constructing allele databases and advancing marker-assisted and genomic selection in complex crops.

## Full-text entities

- **Species:** Citrus (genus) [taxon 2706]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13011809/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13011809/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/PMC13011809/full.md

---
Source: https://tomesphere.com/paper/PMC13011809