# Benchmarking long-read variant calling in diploid and polyploid genomes: insights from human and plants

**Authors:** Yoshinori Fukasawa

PMC · DOI: 10.1186/s12864-025-12259-5 · BMC Genomics · 2026-01-15

## TL;DR

This paper explores how well long-read sequencing can detect genetic variants in genomes with different ploidy levels, finding that genome complexity and structural variations affect accuracy.

## Contribution

The study reveals that structural variations and genome complexity significantly impact variant calling accuracy in long-read sequencing.

## Key findings

- Genotyping accuracy decreases with increasing ploidy due to allelic dosage uncertainty.
- Structural variations and repetitive elements cause spurious read mapping, leading to false variant calls.
- Genome complexity, such as repeat content, strongly influences overall variant calling accuracy.

## Abstract

Accurate characterization of genetic variation is fundamental to genomics. While long-read sequencing technologies promise to resolve complex genomic regions and improve variant detection, their application in complex genomes has not been well validated. Here, we systematically investigate the factors influencing variant calling accuracy using accurate long reads. Using human trio data with known variants to simulate variable ploidy levels (diploid, tetraploid, hexaploid), we demonstrate that while variant sites can often be identified accurately, genotyping accuracy decreases with increasing ploidy due to allelic dosage uncertainty. This highlights a specific challenge in assigning correct allele counts in polyploids even with high depth, separate from the initial variant discovery. We then assessed genotyping and variant detection performance in real genomes with varying complexity: the relatively simple diploid Fragaria vesca, the tetraploid Solanum tuberosum, and the highly repetitive diploid Zea mays. Our results reveal that overall variant calling accuracy is influenced strongly by inherent genome complexity (e.g., repeat content). Furthermore, we identify a critical mechanism impacting variant discovery: structural variations between the reference and sample genomes, particularly those containing repetitive elements, can induce spurious read mapping. This effect is likely exacerbated by the length and accuracy of long reads. This leads to false variant calls, constituting a distinct and more dominant source of error than allelic-dosage uncertainty. Our findings underscore the multifaceted challenges in long-read variant analysis and highlight the need for ploidy-aware genotypers and bias-aware mapping strategies to fully realize the potential of long reads in diverse organisms.

The online version contains supplementary material available at 10.1186/s12864-025-12259-5.

## Linked entities

- **Species:** Fragaria vesca (taxon 57918), Solanum tuberosum (taxon 4113), Zea mays (taxon 4577)

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606], Fragaria vesca (alpine strawberry, species) [taxon 57918], Solanum tuberosum (potatoes, species) [taxon 4113], Zea mays (maize, species) [taxon 4577]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12809965/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12809965/full.md

---
Source: https://tomesphere.com/paper/PMC12809965