# KBeagle: An Adaptive Strategy and Tool for Improving Imputation Accuracy and Computation Time

**Authors:** Xingyu Guo, Jie Qin, Shikai Wang, Jincheng Zhong, Li Liu, Yixi Kangzhu, Daoliang Lan, Jiabo Wang

PMC · DOI: 10.3390/ijms26125797 · International Journal of Molecular Sciences · 2025-06-18

## TL;DR

KBeagle is a new imputation tool that improves accuracy and speed for predicting missing genomic data in organisms.

## Contribution

KBeagle introduces an adaptive clustering strategy using K-Means and multithreading to enhance imputation accuracy and efficiency.

## Key findings

- KBeagle identifies more SNP loci associated with traits compared to Beagle.
- KBeagle achieves lower false discovery and Type I error rates while maintaining detection power.
- The method improves imputation matching rates and reduces computation time.

## Abstract

Whole-genome sequencing (WGS) technology has made significant progress in obtaining the genomic information of organisms and is now the primary way to uncover genetic variation. However, due to the complexity of the genome and technical limitations, large genome segments remain ungenotyped. Imputation is a useful strategy for predicting missing genotypes. The accuracy and computing speed of imputation software are important criteria that should inform future developments in genomic research. In this study, the K-Means algorithm and multithreading were used to cluster reference individuals to reduce the number and improve the length of haplotypes in the subpopulation. We named this strategy “KBeagle”. In the comparison test, we determined that the KBeagle-imputed dataset (KID) can identify more single-nucleotide polymorphism (SNP) loci associated with the specified traits compared to the Beagle-imputed dataset (BID), while also achieving much lower false discovery rates (FDRs) and Type I error rates under the same power of detection of association signals. We envision that the main application of KBeagle will focus on livestock sequencing studies under a strong genetic structure. In summary, we have generated an accurate and efficient imputation method, improving the imputation matching rate and calculation time.

## Full-text entities

- **Diseases:** ND (MESH:D000030), ID (MESH:C537985), KID (MESH:D014813), GP (MESH:D042822), injury to (MESH:D014947), QTNs (OMIM:612306)
- **Chemicals:** QTN (-)
- **Species:** Gallus gallus (bantam, species) [taxon 9031], Homo sapiens (human, species) [taxon 9606], Sus scrofa (pig, species) [taxon 9823], Bos taurus (bovine, species) [taxon 9913]
- **Cell lines:** Beagle — Canis lupus familiaris (Dog), Finite cell line (CVCL_S104), KBeagle — Canis lupus familiaris (Dog), Finite cell line (CVCL_TZ60)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12192696/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12192696/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12192696/full.md

---
Source: https://tomesphere.com/paper/PMC12192696