# Boosting Genomic Prediction Transferability with Sparse Testing

**Authors:** Osval A. Montesinos-López, Jose Crossa, Paolo Vitale, Guillermo Gerard, Leonardo Crespo-Herrera, Susanne Dreisigacker, Carolina Saint Pierre, Iván Delgado-Enciso, Abelardo Montesinos-López, Reka Howard

PMC · DOI: 10.3390/genes16070827 · Genes · 2025-07-16

## TL;DR

This paper explores how using sparse testing data from one location can improve genomic predictions in another location, reducing costs while maintaining accuracy.

## Contribution

The study introduces a novel sparse testing strategy that leverages temporally proximate training data to boost prediction accuracy in genomic selection.

## Key findings

- Incorporating data from Obregon improved prediction accuracy by at least 219% in some cases.
- The percentage of matching top lines increased by 18.42% and 20.79% when using sparse testing.
- Using temporally closer data significantly enhances prediction performance.

## Abstract

Background/Objectives: Improving sparse testing is essential for enhancing the efficiency of genomic prediction (GP). Accordingly, new strategies are being explored to refine genomic selection (GS) methods under sparse testing conditions. Methods: In this study, a sparse testing approach was evaluated, specifically in the context of predicting performance for tested lines in untested environments. Sparse testing is particularly practical in large-scale breeding programs because it reduces the cost and logistical burden of evaluating every genotype in every environment, while still enabling accurate prediction through strategic data use. To achieve this, we used training data from CIMMYT (Obregon, Mexico), along with partial data from India, to predict line performance in India using observations from Mexico. Results: Our results show that incorporating data from Obregon into the training set improved prediction accuracy, with greater effectiveness when the data were temporally closer. Across environments, Pearson’s correlation improved by at least 219% (in a testing proportion of 50%), while gains in the percentage of matching in top 10% and 20% of top lines were 18.42% and 20.79%, respectively (also in a testing proportion of 50%). Conclusions: These findings emphasize that enriching training data with relevant, temporally proximate information is key to enhancing genomic prediction performance; conversely, incorporating unrelated data can reduce prediction accuracy.

## Full-text entities

- **Genes:** BMPER (BMP binding endothelial regulator) [NCBI Gene 168667] {aka CRIM3, CV-2, CV2}
- **Diseases:** GS (MESH:D042822), injury to (MESH:D014947)
- **Species:** Homo sapiens (human, species) [taxon 9606], Pinus halepensis (Aleppo pine, species) [taxon 71633]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12294251/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12294251/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/PMC12294251/full.md

---
Source: https://tomesphere.com/paper/PMC12294251