# Decreasing the computing time of approximated reliabilities of genomic estimated breeding values in the single-step genomic best linear unbiased predictor using different core sizes for the algorithm for proven and young

**Authors:** S.N. Sanchez-Sierra, Matias Bermann, Natascha Vukasinovic, Miguel A. Sánchez-Castro, Ignacy Misztal, Daniela Lourenco

PMC · DOI: 10.3168/jdsc.2025-0892 · JDS Communications · 2026-01-22

## TL;DR

Reducing the core size in a genomic breeding value algorithm significantly cuts computing time but introduces some bias in the results.

## Contribution

A method to reduce computing time for genomic breeding value reliability approximations by adjusting core group size in APY.

## Key findings

- Reducing the core size from 25,000 to 5,000 decreased computing time by 95.4%.
- Approximated reliabilities showed correlations between 0.94 and 1.00 with the benchmark.
- Smaller core sizes led to a 3.1-fold reduction in computing time and 2.1-fold reduction in memory usage.

## Abstract

Summary: Calculating the reliabilities of genomic breeding values in large datasets is computationally challenging; therefore, approximation methods, such as the algorithm for proven and young (APY) within the single-step genomic best linear unbiased predictor (ssGBLUP) methodology, are used. Reducing the APY core group size from 25,000 to 5,000 genotyped animals decreased the computing time of the reliability-approximation algorithm by up to 95.4%. However, this reduction introduced bias into the approximated reliabilities, highlighting the trade-off between computational efficiency and accuracy.

Summary: Calculating the reliabilities of genomic breeding values in large datasets is computationally challenging; therefore, approximation methods, such as the algorithm for proven and young (APY) within the single-step genomic best linear unbiased predictor (ssGBLUP) methodology, are used. Reducing the APY core group size from 25,000 to 5,000 genotyped animals decreased the computing time of the reliability-approximation algorithm by up to 95.4%. However, this reduction introduced bias into the approximated reliabilities, highlighting the trade-off between computational efficiency and accuracy.

•Calculating the reliability of genomic breeding values is computationally challenging.•We present strategies for reducing computing time when approximating reliabilities in large-scale genomic evaluations.•Our approach speeds up computations, though reliability approximations may be affected.

Calculating the reliability of genomic breeding values is computationally challenging.

We present strategies for reducing computing time when approximating reliabilities in large-scale genomic evaluations.

Our approach speeds up computations, though reliability approximations may be affected.

The single-step genomic best linear unbiased predictor (ssGBLUP) along with the algorithm for proven and young (APY) are used to compute GEBV in livestock populations with extensive genomic data. Calculating GEBV reliabilities is computationally expensive, particularly with many genotyped animals, because it requires inverting the left-hand side of the mixed model equations. However, reliabilities in ssGBLUP models can be approximated by leveraging the sparse structure of the APY. The primary computational bottleneck of the algorithm lies in a matrix multiplication step, which scales quadratically with the size of the core set. This study aimed to decrease the computing time for approximating GEBV reliabilities in ssGBLUP by reducing the size of the core set in APY without compromising the precision of the reliability approximations. Reliabilities were approximated for a single-trait model for calf respiratory disease in Holsteins (h2 = 0.042). A dataset comprising 4,563,070 animals in the pedigree, 1,629,592 genotypes, and 1,585,306 records was used for the study. Core sets of varying sizes (25k, 20k, 15k, 10k, and 5k) were evaluated. Approximated reliabilities obtained with a core set size of 25k were used as a comparison benchmark. Correlations between approximated reliabilities obtained with different core sizes and the benchmark ranged from 0.94 to 1.00, whereas the intercept and slope of the regression of the benchmark reliabilities on the smaller core reliabilities ranged from −0.16 to 0.38 and from 0.64 to 1.15, respectively. Computing times varied significantly, with the fastest approximation (55.02 min) achieved using a 5k core, compared with 171.27 min for the 25k core benchmark. This represents a 3.1-fold reduction in computing time and a 2.1-fold reduction in memory usage when comparing the 25k core size with the 5k core size. Additionally, more substantial savings can be obtained as the number of traits increases. Having fewer genotyped animals in the APY core is a reasonable approach to accelerate GEBV reliability calculations; however, changes in the approximated reliabilities occur, underscoring the trade-off between computational efficiency and the accuracy of the approximations.

## Full-text entities

- **Diseases:** respiratory disease (MESH:D012140)
- **Chemicals:** APY (-)
- **Species:** Bos taurus (bovine, species) [taxon 9913]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12958191/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12958191/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12958191/full.md

---
Source: https://tomesphere.com/paper/PMC12958191