# Optimizing training sets to identify superior genotypes in hybrid populations

**Authors:** Szu-Ping Chen, Chen-Tuo Liao

PMC · DOI: 10.3389/fpls.2025.1699491 · 2026-01-15

## TL;DR

This paper introduces methods to optimize training sets for identifying high-performing hybrid genotypes in plant breeding using genomic selection.

## Contribution

The study proposes and evaluates new training set optimization methods for genomic selection in hybrid populations.

## Key findings

- GVaverage provides efficient and informative training sets but may lack diversity in small sets.
- CDmean(v2) is more reliable for small training sets due to better genomic diversity preservation.
- The proposed framework improves genomic prediction accuracy in hybrid breeding programs.

## Abstract

The identification of superior hybrids from candidate populations is a central goal in plant breeding, particularly for commercial applications and large-scale cultivation. In this study, several promising training set optimization methods in genomic selection (GS) are evaluated and extended to construct predictive models for the identification of top-performing genotypes in hybrid populations. The methods investigated include: (i) a ridge regression-based approach, 
MSPE(v2)Ridge, (ii) a generalized coefficient of determination-based method, 
CDmean(v2), and (iii) an A-optimality-like ranking strategy, 
GVaverage. To assess predictive performance in identifying genotypes with the highest true breeding values (TBVs), three evaluation metrics were developed. Since TBVs are latent quantities derived from models, simulation experiments based on real genotype data from wheat (Triticum aestivum L.), maize (Zea mays), and rice (Oryza sativa L.) were carried out to assess the proposed methods. Results demonstrated that 
GVaverage not only achieved substantial computational efficiency but also generally generated highly informative training sets across a broad range of sizes. However, when constructing small training sets, 
GVaverage occasionally failed to maintain adequate genomic diversity. In such cases, 
CDmean(v2) is recommended as a more reliable alternative. Overall, the proposed framework provides a flexible and effective approach to optimizing training sets for hybrid breeding, thereby enhancing the accuracy of genomic prediction in practical breeding programs.

## Linked entities

- **Species:** Zea mays (taxon 4577)

## Full-text entities

- **Chemicals:** CD (MESH:D002104)
- **Species:** Triticum aestivum (bread wheat, species) [taxon 4565], Oryza sativa (Asian cultivated rice, species) [taxon 4530], Zea mays (maize, species) [taxon 4577]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12854142/full.md

---
Source: https://tomesphere.com/paper/PMC12854142