# Optimizing training sets for genomic selection to identify superior genotypes across multiple environments

**Authors:** Zi-Jie Liu, Chen-Tuo Liao

PMC · DOI: 10.1093/g3journal/jkag031 · G3: Genes | Genomes | Genetics · 2026-02-10

## TL;DR

This paper introduces a method to optimize training sets for genomic selection in plant breeding, improving the identification of high-performing genotypes across different environments.

## Contribution

The study proposes and evaluates CDmean(v2) as a superior training set optimization method for genomic selection in multi-environment trials.

## Key findings

- CDmean(v2) outperformed random sampling in identifying top-performing genotypes.
- The method is computationally efficient and suitable for practical breeding programs.
- Simulation experiments validated the effectiveness of CDmean(v2) across diverse crops.

## Abstract

Genomic selection (GS) is a promising strategy in plant breeding for identifying superior genotypes with high true breeding values (TBVs) across multiple environments. However, the relative performance of candidate genotypes often varies due to complex genotype-by-environment (G × E) interactions in multienvironment trials (METs). To address this challenge, we employed a GS prediction model incorporating fixed environment-specific means, random additive genetic effects, and random additive G × E interaction effects to develop training set optimization methods for GS in METs. Two optimization methods derived from the generalized coefficient of determination (CD) criterion—CDmean(v2) (Chen et al. 2024, equivalent to Rincent et al. 2012) and CDmean.MET (Rio et al. 2022)—were evaluated and compared with random sampling. Rather than relying on prediction accuracy–focused correlation metrics, we assessed training set performance using selection-focused ranking metrics, including normalized discounted cumulative gain, Spearman's rank correlation, and rank sum ratio. Because TBVs are latent and unobservable, simulation experiments were conducted using real genotype data from diverse crop datasets, including rice (Oryza sativa L.), barley (Hordeum vulgare L.), and maize (Zea mays L.). Among the evaluated approaches, CDmean(v2) consistently showed high efficiency in identifying top-performing genotypes. In practice, CDmean(v2), implemented using the optimization algorithm provided in the TrainSel package (Akdemir et al. 2021), is recommended for GS-assisted breeding programs, as it produced superior training sets for identifying elite genotypes with reasonable computational cost.

The selection of superior genotypes that perform well across diverse environments is a critical goal in plant breeding. Genomic selection (GS) has emerged as an innovative and promising strategy to achieve this objective. In this study, Liu and Liao propose a cost-effective and optimal approach for determining training sets to enhance the application of GS in METs. The proposed approach could help plant breeders to select suitable genotypes for selective phenotyping, and then increase the success probability of a GS program applied to multi-environment trials.

## Full-text entities

- **Species:** Zea mays (maize, species) [taxon 4577], Oryza sativa (Asian cultivated rice, species) [taxon 4530], Hordeum vulgare (barley, species) [taxon 4513]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13042293/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13042293/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC13042293/full.md

---
Source: https://tomesphere.com/paper/PMC13042293