Optimizing genomic predictions in maize using a diversity panel and a multiparental population

A. López‐Malvar; R. Santiago; A. Butrón; R. A. Malvar; N. Gesteiro

PMC · DOI:10.1002/tpg2.70206·February 25, 2026

Optimizing genomic predictions in maize using a diversity panel and a multiparental population

A. López‐Malvar, R. Santiago, A. Butrón, R. A. Malvar, N. Gesteiro

PDF

Open Access

TL;DR

The study shows that genomic selection in maize is more accurate within similar populations and less reliable when applied across genetically diverse groups.

Contribution

The study evaluates genomic prediction accuracy in two maize populations and highlights the importance of genetic relatedness in training sets for effective genomic selection.

Findings

01

Higher predictive ability was observed in the diversity panel compared to the MAGIC population for most traits.

02

Cross-population prediction had very low accuracy due to differences in allele frequencies and linkage disequilibrium.

03

Combining both populations in training did not improve prediction accuracy and sometimes reduced it.

Abstract

Genomic selection allows the prediction of genetic values using SNP markers distributed across the genome. Its effectiveness depends on factors such as trait heritability, genetic similarity between training and validation sets, and population structure. Although results in homogeneous populations have been promising, its application in diverse germplasm remains a challenge. This study evaluates the predictive capacity of genomic best linear unbiased prediction models applied to agronomic and biochemical‐structural traits related to stover quality in two maize populations: a diversity panel and a multiparental advanced generation inter‐cross (MAGIC) population. Higher heritability was observed in the panel, especially for flowering traits (h 2 ≥ 0.88), with high intra‐population predictive abilities (PA = 0.15–0.75) for most traits, compared to MAGIC (PA = 0.14–0.37). However, when…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Zea mays

Chemicals4

glucose DFA p-acid

Diseases7

NDF GBS BLUE GBLUP PC ADF PA

Figures6

Click any figure to enlarge with its caption.

Schematic of validation approaches used to assess the predictive ability of genomic best linear unbiased prediction (GBLUP) models. (A) Within‐population cross‐validation (20 folds) used to assess the predictive ability of the GBLUP models in the Ames panel (blue) and the multiparent advanced generation intercross (MAGIC) population (orange). A 10‐fold cross‐validation was implemented for each population (folds 1–10 for Ames and folds 11–20 for MAGIC). In each iteration, onefold (10% of the lines of the corresponding population) was excluded to form the test set, and two models were fitted: one specific (r_w), trained with the remaining 90% of that population, and one combining (r_x), trained with the same 90% + 100% of the other population. (B) In the cross‐population validation, models were trained using only one of the populations (Ames or MAGIC) and validated on the other.

Scree plot of the explained variance by the principal components in (A) the Ames association panel lines and (B) the multiparent advanced generation intercross (MAGIC) population.

Principal component analysis (PCA) showing the distribution of the Ames association panel lines based on (A) their breeding program and (B) germplasm group, (C) the multiparent advanced generation intercross (MAGIC) population, and (D) the combined visualization of both populations, using genotyping‐by‐sequencing (GBS) data.

Scatter plots of observed versus predicted values for (A) male flowering, (B) female flowering, (C) grain yield, and (D) stover yield. Three genomic best linear unbiased prediction (GBLUP) models are shown: rx (green circles), trained with all lines (panel + multiparent advanced generation intercross [MAGIC]) and validated in the corresponding test set; r_w (panel; blue triangles), trained and validated within the panel; and r_w (MAGIC; orange squares), trained and validated within MAGIC.

Scatter plots of observed versus predicted values for (A) digestibility of organic matter, (B) acid and (C) neutral detergent fiber, and (D) saccharification efficiency. Three genomic best linear unbiased prediction (GBLUP) models are shown: rx (green circles), trained with all lines (panel + multiparent advanced generation intercross [MAGIC]) and validated in the corresponding test set; r_w (panel; blue triangles), trained and validated within the panel; and r_w (MAGIC; orange squares), trained and validated within MAGIC.

Scatter plots of observed versus predicted values for (A) p‐coumarate, (B) ferulate, (C) diferulate (DFA) 5‐5, (D) DFA 8‐O‐4, (E) DFA 8‐5, and (F) total diferulates (DFAT). Three genomic best linear unbiased prediction (GBLUP) models are shown: rx (green circles), trained with all lines (panel + multiparent advanced generation intercross [MAGIC]) and validated in the corresponding test set; r_w (panel; blue triangles), trained and validated within the panel; and r_w (MAGIC; orange squares), trained and validated within MAGIC.

Tables7

TABLE 1. Summary of phenotypic trait availability, by population and publication status, including methodological references.

Trait	Panel	MAGIC	Method reference
Anthesis and silking	Not previously published	Jiménez‐Galindo et al. (2019)	Jiménez‐Galindo et al. (2019)
Grain Yield	Gesteiro et al. (2023)	Jiménez‐Galindo et al. (2019)	Gesteiro et al. (2023)
Stover Yield	Gesteiro et al. (2023)	López‐Malvar, Butron, et al. (2021)	Gesteiro et al. (2023)
Digestibility of organic matter, acid and neutral detergent fiber	Gesteiro, Malvar, Butrón, Holland, López‐Malvar, et al. (2025)	López‐Malvar, Butron, et al. (2021)	López‐Malvar, Butron, et al. (2021)
Saccharification efficiency	Gesteiro et al. (2023)	López‐Malvar, Butron, et al. (2021)	Gómez et al. (2010)
Cell wall‐bound hydroxycinnamates	Gesteiro, Malvar, Butrón, Holland, Souto, et al. (2025)	López‐Malvar et al. (2022)	Santiago et al. (2018)

TABLE 2. Variation range and heritability (ℎ2) values of Ames association panel and multiparent advanced generation intercross (MAGIC) inbreds for agronomic traits: days to anthesis, days to silking, and grain and stover yield (g/plant).

	Anthesis		Silking		Grain yield		Stover yield
Population	Range	h ²	Range	h ²	Range	h ²	Range	h ²
Panel ^a	74.78–99.12	0.91 ± 0.01	73.28–99.03	0.88 ± 0.02	6.50–140.77	0.61 ± 0.05	14.45–148.31	0.59 ± 0.05
MAGIC ^b	65.22–92.22	0.62 ± 0.01	65.59–102.25	0.70 ± 0.01	2.69–157.21	0.59 ± 4.6E‐4	5.85–141.41	0.58 ± 0.05

TABLE 3. Variation range and heritability (ℎ 2) values of Ames association panel and multiparent advanced generation intercross (MAGIC) inbreds for stover quality related‐traits: digestibility of organic matter (DOM), acid detergent fiber (ADF), and neutral detergent fiber (NDF; %) and saccharification efficiency (SACC; nmol glucose/mg/h).

	DOM		ADF		NDF		SACC
Population	Range	h ²	Range	h ²	Range	h ²	Range	h ²
Panel ^a	45.9–63.3	0.61 ± 0.05	43.1–57.7	0.63 ± 0.05	57.8–73.8	0.46 ± 0.06	85.9–184.8	0.18 ± 0.11
MAGIC ^b	47.8–63.0	0.58 ± 0.04	40.7–59.4	0.54 ± 0.06	53.6–77.5	0.45 ± 0.06	118.5–189.5	0.05 ± 0.08

TABLE 4. Variation range and heritability (ℎ 2) values of Ames association panel and multiparent advanced generation intercross (MAGIC) inbreds for cell wall‐bound hydroxycinnamates (mg/g).

	p‐Coumarate		Ferulate		DFA 5‐5		DFA 8‐O‐4		DFA 8‐5		Total diferulates
Population	Range	h ²	Range	h ²	Range	h ²	Range	h ²	Range	h ²	Range	h ²
Panel ^a	3.9–11.1	0.67 ± 0.04	1.8–4.6	0.64 ± 0.05	0.07–0.29	0.52 ± 0.06	0.08–0.52	0.45 ± 0.08	0.14–0.65	0.48 ± 0.06	0.31–1.30	0.47 ± 0.07
MAGIC ^b	4.4–10.9	0.59 ± 0.04	1.9–5.1	0.60 ± 0.04	0.10–0.43	0.36 ± 0.06	0.13–0.62	0.42 ± 0.06	0.00–0.56	0.32 ± 0.06	0.29–1.38	0.40 ± 0.06

TABLE 5. Predictive ability and accuracy (in parentheses) estimate of genomic best linear unbiased prediction (GBLUP) models for agronomic traits: male flowering (anthesis), female flowering (silking), grain yield and stover yield, using different training groups (PANEL, multiparent advanced generation intercross [MAGIC], and the combined dataset of all lines, PANEL + MAGIC), and validated on the Ames association panel inbreds (PANEL) or the MAGIC population (MAGIC).

Validation group: PANEL
Training group	Anthesis	Silking	Grain yield	Stover yield
PANEL	0.75 (0.79)	0.75 (0.80)	0.39 (0.50)	0.63 (0.82)
MAGIC	0.19 (0.20)	0.15 (0.16)	0.33 (0.42)	0.36 (0.47)
PANEL + MAGIC	0.74 (0.78)	0.72 (0.77)	0.41 (0.52)	0.62 (0.81)

TABLE 6. Predictive ability and accuracy (in parentheses) of genomic best linear unbiased prediction (GBLUP) models for stover quality related‐traits: digestibility of organic matter (DOM), acid detergent fiber (ADF), and neutral detergent fiber (NDF) and saccharification efficiency (SACC), using different training groups (PANEL, multiparent advanced generation intercross [MAGIC], and the combined dataset of all lines, PANEL + MAGIC), and validated on the Ames association panel inbreds (PANEL) or the MAGIC population (MAGIC).

Validation group: PANEL
Training group	DOM	ADF	NDF	SACC
PANEL	0.15 (0.20)	0.3 (0.38)	0.32 (0.47)	0.22 (0.52)
MAGIC	0	0	0	0
PANEL + MAGIC	0.18 (0.23)	0.29 (0.36)	0.30 (0.44)	0.22 (0.52)

TABLE 7. Predictive ability and accuracy (in parentheses) of genomic best linear unbiased prediction (GBLUP) models for cell wall‐bound hydroxycinnamates, using different training groups (PANEL, MAGIC, and the combined dataset of all lines, PANEL + multiparent advanced generation intercross [MAGIC]), and validated on the Ames association panel inbreds (PANEL) or the MAGIC population (MAGIC).

Validation group: PANEL
Training group	p‐Coumarate	Ferulate	DFA 5‐5	DFA 8‐O‐4	DFA 8‐5	DFAT
PANEL	0.48 (0.59)	0.37 (0.46)	0.32 (0.44)	0.34 (0.51)	0.17 (0.25)	0.32 (0.47)
MAGIC	0.04 (0.05)	0.11 (0.14)	0	0	0	0
PANEL + MAGIC	0.44 (0.54)	0.38 (0.47)	0.28 (0.38)	0.33 (0.49)	0.13 (0.19)	0.29 (0.42)

Funding2

—ANR10.13039/501100001665
—MCIU/AEI/FEDER, UE

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic and phenotypic traits in livestock · Genetic Mapping and Diversity in Plants and Animals · Genetics and Plant Breeding

Full text

INTRODUCTION

1

Genomic selection, as proposed by Meuwissen et al. (2001), involves the use of high‐density single nucleotide polymorphism (SNP) genotypes to estimate genomic estimated breeding values (GEBVs) through genomic prediction. Unlike methods that only consider significant associations between markers and traits, this approach simultaneously incorporates all markers, making the GEBV the sum of their individual effects.

To implement genomic prediction, a reference population composed of individuals with phenotypes and genotypes is usually required. Marker effects are estimated in this population to then predict GEBVs in a selection group of individuals, in which only genotypic data are available (Wientjes et al., 2015). This method is particularly useful for assessing costly or difficult‐to‐measure traits, although its accuracy may decrease when there is a large genetic distance between the training group and the prediction group (Pszczola et al., 2012)

In parallel with the development of predictive models, significant effort has been invested to design the indicators of prediction accuracy. These metrics allow breeders to assess the feasibility of a generic training set (TS) to predict genetic values in a specific population and optimize both TS composition and breeding programs (Rio et al., 2019). Ideally, having a training set consisting of genetically diverse individuals would significantly reduce costs by extending the applicability of the model to broader populations (Hayes et al., 2009). However, genomic prediction models are usually developed under the assumption of homogeneous populations, which is rarely true in practice. Genetic structure—driven by differences in allele frequencies between subgroups and the ancestral population—can limit predictive ability. For example, in maize, structure is observed both at the level of heterotic groups, selected for complementarity to maximize heterosis, and within these groups (Rincent et al., 2014). This heterogeneity may lead to quantitative trait locus (QTL) effects not being conserved across individuals, thus decreasing prediction accuracy in diverse populations (Albrecht et al., 2014; Goddard & Hayes, 2007; Guo et al., 2014). Most published studies assume a single population, which implies conservation of QTL effects between individuals, but this requires further validation experiments to investigate whether the published high prediction accuracies can be applied successfully in populations other than those in which GEBVs were estimated (Goddard & Hayes, 2007, 2011; Rio et al., 2019).

In general, the accuracy of genomic best linear unbiased prediction (GBLUP) depends on key factors such as the size of the TS; the degree of linkage disequilibrium (LD) between SNPs and QTL, which may vary across populations; the heritability of the trait; and the availability of information from close relatives (Crossa et al., 2017; Habier et al., 2010; Saatchi et al., 2011; Wientjes et al., 2015; Windhausen et al., 2012). Consequently, the prediction accuracy of individuals who are not part of the reference population tends to be limited, which raises a key question for breeders: can the accuracy of standard models such as GBLUP be improved by mixing individuals connecting different groups?

Another factor that can affect the accuracy of genomic prediction across populations is the number of QTLs underlying the trait. In within population studies, it has been shown that when using linear unbiased prediction methods such as GBLUP, accuracy is independent of the number of QTLs, as long as there are no QTLs that explain an extremely large part of the genetic variance (Wientjes et al., 2015). However, these analyses have been limited to assessing the effect of the number of QTLs within a population, without addressing their impact on predictions between populations (Wientjes et al., 2015).

In this regard, our group has conducted extensive studies using two different mapping populations: a diversity panel and a structured multiparental population. Our results demonstrate that QTL mapping using diversity panels and structured multiparental populations provides complementary insights into the genetic architecture of economically important traits in maize. Each approach has distinct advantages. Diversity panels maximize genetic variability and enable high‐resolution QTL mapping due to rapid LD decay. However, their underlying genetic structure can lead to false positives, requiring statistical corrections (J. Yu et al., 2006). In contrast, structured populations such as multiparent advanced generation inter‐cross (MAGIC) populations provide more balanced allele frequencies, as all founders contribute equally, with their mapping power being enhanced by high minor allele frequencies (Cockram & Mackay, 2018; J. Yu et al., 2006).

Through these two complementary approaches, our studies have identified key genomic regions associated with maize performance and cell wall composition through association mapping, confirming the quantitative inheritance of these traits (Gesteiro et al., 2023; Jiménez‐Galindo et al., 2019; López‐Malvar, Butron, et al., 2021; López‐Malvar, Malvar, et al., 2021; López‐Malvar et al., 2022). However, the low percentage of variance explained by individual markers and the absence of major QTL with strong effects rules out the option of marker‐assisted selection and suggests that genomic selection is the most effective breeding strategy for improving maize productivity and cell wall traits. Genomic selection does not require the prior identification of QTL and instead estimates all marker effects across the genome simultaneously.

Accordingly, the objective of this study is to develop genomic prediction models to assess the predictive capacity of agronomic and biochemical‐structural traits of high economic interest in maize, which showed different range of variation and different heritability values. Specifically, models will be built using data from either a MAGIC population or a diversity panel and validated both within the same population and across the other population to assess whether models developed in one population are effective at predicting lines from a completely different population. Additionally, we will explore whether combining both populations to develop a joint model could improve predictive ability within each population.

Core Ideas

The accuracy of genomic prediction is strongly influenced by trait heritability and the genetic composition of the training and validation populations.
Differences in linkage disequilibrium and allele frequencies between populations reduce the transferability of predictive markers between populations.
Effective genomic selection in diverse germplasm requires training sets that balance diversity and relatedness, population structure, and biological context.

MATERIALS AND METHODS

2

Plant material

2.1

We used two populations of inbreds with different genetic diversity and population structure: (i) recombinant inbred lines (RILs) from a MAGIC population and (ii) inbred lines belonging to the Ames association panel. The MAGIC population was developed by the Genetics and Maize Breeding group from Misión Biológica de Galicia (CSIC) crossing eight parental inbred lines of temperate maize with diverse genetic origins (Butrón et al., 2019; Jiménez‐Galindo et al., 2019). Six of these inbred lines derive from European germplasm (EP17, EP43, EP53, EP86, PB130, and F473), while A509 comes from North American germplasm and EP125 has an unknown origin. All parental lines can be classified as non‐stiff stalk germplasm. The Ames association panel was provided by the United States Department of Agriculture North Central Regional Plant Introduction Station in Iowa, which maintains more than 14,000 maize accessions from around the world. When classifying these varieties, most are grouped into the two primary germplasm categories recognized by temperate maize breeders: non‐stiff stalk and stiff stalk. This panel also includes materials from international breeding programs (e.g., Spain, France, China, Argentina, Australia), which appear to represent germplasm groups distinct from those commonly used in commercial breeding programs.

The details of the field trials conducted to evaluate 378 inbred lines from the MAGIC population together with the eight parents (EP17, EP43, EP53, EP86, PB130, F473, A509 and EP125) in 2016 and 2017 are described in López‐Malvar, Butron, et al., 2021; López‐Malvar, Malvar, et al., 2021; López‐Malvar et al., 2022), while the evaluation of the 238 Ames panel lines in 2018 and 2019 is described in Gesteiro et al. (2023), Gesteiro, Malvar, Butrón, Holland, López‐Malvar, et al. (2025), and Gesteiro, Malvar, Butrón, Holland, Souto, et al. (2025).

The phenotypic data used in this study have been partially published previously (Gesteiro et al., 2023; Gesteiro, Malvar, Butrón, Holland, López‐Malvar, et al., 2025; Gesteiro, Malvar, Butrón, Holland, Souto, et al., 2025; López‐Malvar, Butron, et al., 2021; López‐Malvar, Malvar, et al., 2021; López‐Malvar et al., 2022; Table 1; Table S1). Days to female and male flowering, which are reported here for the first time for both populations, were recorded as the number of days from sowing until approximately 50% of the plants showed visible silks (silking) or extruded anthers (anthesis). The genome‐wide association studies (GWAS) results for these flowering traits are provided in this revision as the Supporting Information (Tables S2–S3, Figure S1). GWAS for most traits in the diversity panel were reported in these publications and are summarized in the Supporting Information (Table S4).

Genotypic data

2.2

The inbred lines of both populations were both genotyped by the genotyping‐by‐sequencing (GBS) method, using B73 genome version 4 as reference in the case of the Ames panel and B73 genome version 2 as reference for the MAGIC population (Jiménez‐Galindo et al., 2019; Romay et al., 2013). The genotyping of both populations was updated to V5 using the reference sequences available in the MaizeGDB database (https://www.maizegdb.org/). In the TASSEL 5.2.54 software, the genotypic matrices corresponding to the Ames panel and the MAGIC population were integrated. For this purpose, markers were aligned to ensure the coincidence between both matrices, which were unified into a single genotypic base. Subsequently, a filtering process was applied in which markers with more than 10% missing data were discarded (considering heterozygotes as missing data). Markers with a minor allele frequency lower than 0.05, monomorphic and multiallelic SNPs, as well as insertion/deletion polymorphisms were also removed. This filtering process generated a final genotypic matrix composed of 238 lines from the Ames panel, 378 RILs lines from the MAGIC population, and a total of 12,829 SNPs.

The resulting genotypic matrix was converted into numerical values using the “numerical genotypes” option of TASSEL 5.2.54 (Bradbury et al., 2007). This matrix, denoted M, uses the values 2 and 0 to represent, respectively, homozygotes of the least and most frequent alleles. In this format, the rows represent individuals, and the columns represent genetic loci. To simplify the analysis, in the R environment, the values of M were adjusted by subtracting one from all the elements, thus obtaining a matrix with values 1 and −1.

To handle the missing data, we used the A.mat function of the rrBLUP package, implemented in R (http://cran.r‐project.org/web/packages/rrBLUP; Endelman, 2011). This function replaced missing values with the population mean corresponding to each marker.

Finally, the G matrix was calculated in R using the rrBLUP package. This calculation was based on the covariances derived from the numerical values of the markers in matrix M. Matrix G mathematically represents the genetic similarity between individuals in the population, based on their genotypic data. This matrix was used as a basis in genomic prediction models to capture the combined effects of all markers across the genome.

Statistical analysis

2.3

Best linear unbiased estimators (BLUEs) were calculated for each inbred line using the combined data from both populations in a 4‐year analysis (Table S1). Genotypes were considered fixed effects, while years and blocks were treated as random effects. These BLUEs constituted the phenotypic matrix used in the prediction models. Heritabilities of traits were estimated on a mean basis following the method described by Holland et al. (2003).

Principal components analysis to study population structure

2.4

Population structure was analyzed by principal component analysis using the marker data. This analysis generates several components, including “sdev,” which represents the standard deviations associated with each principal component (PC), and “x,” which is the matrix of PC scores. We can assess the relative importance of each PC by calculating and plotting their variances.

In the specific case of the panel, the lines were assigned to a specific breeding group or program according to the information available in the Germplasm Resources Information Network database. We created two “data frames”: one associating each line with its breeding program and another associating each line with its germplasm group, which facilitated the estimation of similarity between lines in a two‐dimensional space and the assessment of whether the GBS data reflected genetic variation consistent with their known ancestral history.

Genomic prediction models

2.5

The GBLUP method (VanRaden, 2008) was used to predict the GEBVs for all evaluated individuals belonging to the two populations, using the R programming environment and the rrBLUP package (Endelman, 2011). GBLUP is a statistical method that combines phenotypic and genomic data through the use of a genomic matrix G.

The mixed model used in the GBLUP analysis follows the standard structure:

[eqn]

where y is the vector of phenotypes adjusted for a character, b is the vector of the overall mean adjusted as a fixed effect, u is the vector of the GEBVs of each individual, and e is the vector of residual errors. X and Z are incident matrices of b and u, respectively. This formulation allows the GEBVs to be calculated for all individuals included in the G matrix.

Three different approaches for prediction were carried out, considering the different populations and their combination. The first model used only the lines of the Ames panel, the second used only the lines of the MAGIC population, and the third combined all the lines from both populations. This design allowed us to evaluate how the integration of various populations impacts the accuracy and reliability of the predictions.

For models that were developed using exclusively Ames panel or MAGIC population data, the genotypic and phenotypic data were first filtered to include only the lines corresponding to each specific population. This means that to analyze the Ames panel, the MAGIC lines were excluded and vice versa.

Evaluation of the precision of genomic predictions

2.6

To assess the predictive ability of genomic predictions within each population, a cross‐validation approach was implemented (Hastie et al., 2009). This method divides the data into training and test sets. The phenotypic and genotypic data from the training set are used to fit the prediction equations, while in the test set, the phenotypic values are considered as unknowns (NA). In this way, predictions are based solely on genotypic information.

In this study, cross‐validation was implemented using a 10‐fold scheme per population. Folds 1–10 were assigned exclusively to Ames panel lines, while folds 11–20 corresponded to the MAGIC population lines. In each iteration, a fold equivalent to 10% of the data was used as the test set, while the remaining 90% formed the training set (Figure 1A). This procedure ensured that each fold was used once as a test set, allowing a complete evaluation of all data.

Schematic of validation approaches used to assess the predictive ability of genomic best linear unbiased prediction (GBLUP) models. (A) Within‐population cross‐validation (20 folds) used to assess the predictive ability of the GBLUP models in the Ames panel (blue) and the multiparent advanced generation intercross (MAGIC) population (orange). A 10‐fold cross‐validation was implemented for each population (folds 1–10 for Ames and folds 11–20 for MAGIC). In each iteration, onefold (10% of the lines of the corresponding population) was excluded to form the test set, and two models were fitted: one specific (r_w), trained with the remaining 90% of that population, and one combining (r_x), trained with the same 90% + 100% of the other population. (B) In the cross‐population validation, models were trained using only one of the populations (Ames or MAGIC) and validated on the other.

Thus, in folds 1–10, corresponding to the Ames panel, in each iteration, 10% of the lines belonging to that fold were excluded to form the test set. Two models were fitted: a combined model, trained with the remaining lines of the Ames panel together with all the lines of the MAGIC population, and a specific model, trained only with the remaining lines of the Ames panel. Both models were validated using as test set the same lines excluded from the Ames panel at each fold. To visualize the predictive ability, the dispersion of the observed and predicted values across all folds was plotted for both the combined model (r_x) and the specific Ames panel model (r_w [panel]). As both models were validated on the same excluded lines, a direct comparison could be made.

Similarly, in folds 11–20, corresponding to the MAGIC population, at each iteration, 10% of the lines belonging to that fold were excluded to form the test set, and two models were fitted: a combined model, trained with the remaining MAGIC lines plus all lines from the Ames panel, and a specific model, trained exclusively with the remaining MAGIC lines. In each case, both models were validated using the same MAGIC lines excluded in the test set. The dispersion of observed and predicted values across folds was plotted for the combined model (r_x) and the MAGIC‐specific model (r_w[MAGIC]).

At the end of the 10 iterations for each population, the average correlation between the observed and predicted values was calculated. This procedure allowed us to compare the population‐specific predictive ability of predictions based on single and combined populations. Subsequently, the accuracy of the predictions was estimated by dividing the predictive ability by the square root of the heritability (Legarra et al., 2008).

Cross‐validation between populations

2.7

In order to evaluate whether a model trained on lines from a specific population can accurately predict the phenotypic values of lines from another population, cross‐population validations were performed. In the first analysis, the model was trained using only lines from the Ames panel and validated on lines from the MAGIC population. Conversely, the model was trained with the MAGIC lines and validated on the Ames panel lines (Figure 1B). In both cases, the correlation between observed and predicted values was calculated to assess the inter‐population predictive ability of the model and divided by the square root of heritability to estimate the accuracy of the predictions (Legarra et al., 2008).

RESULTS

3

Traits’ heritability

3.1

The variation range and h ^2^ for agronomic traits in both the Ames association panel and the MAGIC population are summarized in Table 2. Flowering time showed high heritability in the panel and moderate‐high heritability in the MAGIC population. The differences in heritability estimates for flowering between the two populations were significant, as the confidence intervals of the heritabilities (considering the interval as ±2× standard error) did not overlap. However, female flowering exhibited a broader range of variation in the MAGIC population compared to the panel. For grain and stover yields, heritability estimates were moderate in both populations, with similar variation ranges among MAGIC compared to panel inbreds.

Similarly, MAGIC and Ames panel populations showed different heritability values for stover quality related‐traits, but these differences were not significant (Table 3). Traits such as digestibility of organic matter (DOM) and fiber content (acid detergent fiber [ADF] and neutral detergent fiber [NDF]) exhibited moderate heritability and similar ranges of variation across populations. Saccharification efficiency (SACC) showed the lowest heritability estimates, with values not significantly different from zero in either population.

Regarding hydroxycinnamate contents (Table 4), the panel exhibited slightly higher heritability estimates than the MAGIC population, but the differences were not significant. The highest heritability was observed for p‐coumarate and ferulate among Ames panel inbreds. On the other hand, heritability values for diferulates (DFAs) were lower in both populations compared to those for ferulate and p‐coumarate. The major differences between the heritabilities for DFAs between panel and MAGIC populations were found for the DFA 5‐5 and DFA 8‐5 compounds, although these differences were not significant (DFA 5‐5: 0.52 vs. 0.36; DFA 8‐5: 0.48 vs. 0.32). Regarding the variation ranges, in general, they were similar in both populations for all hydroxycinnamates.

Principal components analysis to study population structure

3.2

In the Ames panel, the first two PCs (PC1 and PC2) accounted for approximately 8% of the variance, suggesting the presence of population structure although not very pronounced. In contrast, in the MAGIC population, they barely reached 3.5%, suggesting the absence of relevant structuring (Figure 2).

Scree plot of the explained variance by the principal components in (A) the Ames association panel lines and (B) the multiparent advanced generation intercross (MAGIC) population.

When representing the values of the lines for the two first PCs, it was observed that, in the Ames panel, the clusters do not strictly correspond to the groups of origin, although up to four groups or clusters can be distinguished. In contrast, the MAGIC population showed no apparent structure at all, indicating a more uniform distribution of genetic variation among its lines, without distinct subgroups (Figure 3).

Principal component analysis (PCA) showing the distribution of the Ames association panel lines based on (A) their breeding program and (B) germplasm group, (C) the multiparent advanced generation intercross (MAGIC) population, and (D) the combined visualization of both populations, using genotyping‐by‐sequencing (GBS) data.

This clustering pattern (or lack thereof, in the case of the MAGIC) is relevant for genomic selection, since the accuracy of predictions increases when the training set includes lines that are genetically similar to those to be predicted. Genomic selection is especially effective for highly polygenic traits governed by numerous small‐effect genes, where overall genomic similarity is of greater importance than specific allele matching.

Genomic prediction

3.3

Predictive ability and accuracy varied for agronomic traits depending on the training and validation groups used (Table 5).

Training and validations based on panel inbred lines showed that the genomic model had very high predictive ability for flowering traits (predictive ability [PA] = 0.75) and moderately high predictive ability for yields (0.39 for grain and 0.63 for stover yields). In contrast, when training the model only with lines from the MAGIC population, the PA for panel inbred performance decreased markedly for all traits, reaching values below 0.2 for flowering traits. On the other hand, the combination of both panel and MAGIC inbreds in the training group generated predictive results for panel inbreds comparable to those obtained using a model based only on panel inbreds.

When the training group was only composed of panel inbred lines and validation was made using the MAGIC population lines, the PA were zero or close to zero for days to silking and yields. However, when training was performed with MAGIC inbreds only, or with MAGIC inbreds together with panel inbreds, the ability to predict the performance of MAGIC inbreds improved, and the values obtained were very similar between both training sets. Even so, when the model was trained and validated exclusively with panel inbreds, the PA were considerably higher than those obtained using only MAGIC population inbreds.

As shown in Figure 4, the combination of lines from both populations in the training group maintained predictive abilities similar to those obtained using only the eigenlines. Therefore, this combination did not seem to strengthen the model after incorporating lines with different genetic diversity.

Scatter plots of observed versus predicted values for (A) male flowering, (B) female flowering, (C) grain yield, and (D) stover yield. Three genomic best linear unbiased prediction (GBLUP) models are shown: rx (green circles), trained with all lines (panel + multiparent advanced generation intercross [MAGIC]) and validated in the corresponding test set; r_w (panel; blue triangles), trained and validated within the panel; and r_w (MAGIC; orange squares), trained and validated within MAGIC.

A similar pattern was observed for stover quality related‐traits (Table 6): the highest PAs were obtained when the models were trained and validated on the same population. In both populations, the PAs for ADF and NDF were moderate (∼0.3). However, the PA of the model trained with panel inbreds to predict DOM in inbreds from the same population was very low (0.15), while the model trained and validated exclusively in the MAGIC population reached a moderate value (0.37). As a consequence, when combining both populations in the training set and validating on panel inbreds, the PA improved slightly (0.18). In the case of SACC, the PA was slightly higher in the model built only with panel inbreds (0.22 vs. 0.14). The combined models showed PA values similar to those obtained with models trained on a single population (Figure 5), while cross‐validation using models trained on lines from a different population than the validation set resulted in a null PA.

Scatter plots of observed versus predicted values for (A) digestibility of organic matter, (B) acid and (C) neutral detergent fiber, and (D) saccharification efficiency. Three genomic best linear unbiased prediction (GBLUP) models are shown: rx (green circles), trained with all lines (panel + multiparent advanced generation intercross [MAGIC]) and validated in the corresponding test set; r_w (panel; blue triangles), trained and validated within the panel; and r_w (MAGIC; orange squares), trained and validated within MAGIC.

In the case of cell wall‐bound hydroxycinnamates (Table 7), it was observed that, once again, models trained and validated within the same population achieve the highest PA, with the models built using panel inbreds performing better. The combined models showed PA values that were very similar or slightly lower than those obtained with models trained on a single population. Figure 6 clearly illustrates that in the case of DFAs, the panel lines (r_w [panel]) and the MAGIC lines (r_w [MAGIC]) were notably distinct; however, combining lines from both populations (r_x) did not improve the model's PA. Furthermore, it was evident that a model trained on one population could not accurately predict the performance of inbreds from the other, as the PA estimates were zero.

Scatter plots of observed versus predicted values for (A) p‐coumarate, (B) ferulate, (C) diferulate (DFA) 5‐5, (D) DFA 8‐O‐4, (E) DFA 8‐5, and (F) total diferulates (DFAT). Three genomic best linear unbiased prediction (GBLUP) models are shown: rx (green circles), trained with all lines (panel + multiparent advanced generation intercross [MAGIC]) and validated in the corresponding test set; r_w (panel; blue triangles), trained and validated within the panel; and r_w (MAGIC; orange squares), trained and validated within MAGIC.

DISCUSSION

4

Our results show that genomic prediction ability is conditioned by both the heritability of traits and the genetic structure of the populations analyzed. Previous studies have documented a positive correlation between heritability and predictive ability: traits with higher h ^2^ tend to have higher PA (Dzievit et al., 2021; Kaler et al., 2022; Liu et al., 2018; Zhang et al., 2017). In line with this, the heritability and PA estimates for days to flowering were higher in the Ames panel than in the MAGIC population, although wider phenotypic range for female flowering date was observed in the MAGIC population. This broader phenotypic range may be explained by the presence of EP17, one of the eight parents, which shows late flowering. Since the alleles are balanced in the MAGIC design, the influence of late‐flowering alleles becomes more evident. In contrast, the lines included in the diversity panel were selected based on their successful adaptation under local conditions in Pontevedra, which likely led to the exclusion of more extreme phenotypes, narrowing the phenotypic range for this trait. This reflects that the proportion of genetic variance explained by the markers—which corresponds with high h ^2^—increases the accuracy of the predicted genomic values, regardless of the higher phenotypic variance (Kaler et al., 2022).

PA was significantly higher in intra‐population validation compared to inter‐population validation, that is, when the validation individuals belonged to the same population as those used for training. The Ames panel, with 8% of variance explained by the first two PCs, has a slight population structure and stable LD phases (Gesteiro et al., 2025), which favors internal validation (Kaler et al., 2022; Rincent et al., 2017). In contrast, MAGIC population—genetically homogeneous (3.5% variance in PC1–PC2)—showed lower intra‐population PA. More importantly, cross‐population PA collapsed (PA < 0.05) when training in Ames and validating in MAGIC or vice versa, a common phenomenon when LD phase and allele frequencies differ between training and validation sets (Lopez‐Cruz et al., 2021; Ramstein & Buckler, 2022; G. Yu et al., 2024). Discordance in LD and allele frequencies between populations limits the transfer of additive effects: predictive markers in one population lack power in the other (Bian & Holland, 2017; Steyn et al., 2019; G. Yu et al., 2024). This collapse of cross‐population PA has also been associated with increases in genetic distance (F_ST) between the groups, while kinship‐optimized training sets achieve improvements, especially in low h ^2^ traits (Li et al., 2021; Scutari et al., 2016).

Attempts to combine Ames and MAGIC in a single training set did not provide consistent benefits. In a maize dent panel (“Amaizing Dent”), it was observed that, for a fixed training size, calibrating models within the same genetic group maximizes PA, although a diverse set can maintain moderate accuracies for all subgroups, with marginal benefits from adding extra‐group individuals (Rio et al., 2019). In our study, simple population merging did not strengthen the model and sometimes slightly reduced intra‐population PA, probably due to dilution of group‐specific LD phases.

To overcome these limitations, optimizing training set size and composition is key. Strategies such as selecting individuals for the training group with high genetic relatedness to the validation population can increase PA by up to 16% in GBLUP and BayesB models (Li et al., 2021; Scutari et al., 2016). Moreover, incorporating population structure as a fixed effect (PSAPGP model) provides additional improvements of 8%–11% in cross‐pop predictions (G. Yu et al., 2024). However, while these approaches have proven to be moderately effective in specific contexts, their broader applicability may be limited by the computational complexity involved—which does not always result in substantial gains—and, more importantly, by the high genetic heterogeneity among populations.

Taken together, our results suggest that combining genetically distinct populations does not improve PA and may sometimes reduce it. In contrast, structured populations, such as diversity panels, tend to provide higher PA, likely due to their clearer genetic structure and more consistent LD patterns. Trait heritability also plays a crucial role in the success of genomic prediction, with higher heritability values associated with better model performance. These findings highlight the importance of carefully designing training sets that balance genetic diversity and relatedness, incorporating information on kinship, population structure, and trait biology to develop robust and transferable prediction models.

In addition, GWAS results from both populations (Gesteiro et al., 2023; Gesteiro, Malvar, Butrón, Holland, López‐Malvar, et al., 2025; Gesteiro, Malvar, Butrón, Holland, Souto, et al., 2025; López‐Malvar, Butron, et al., 2021; López‐Malvar, Malvar, et al., 2021; López‐Malvar et al., 2022) show that significant SNPs do not colocalize between the diversity panel and the MAGIC population. This confirms that there are no markers with stable effects across populations that could improve cross‐population prediction through marker‐assisted approaches. Consequently, for these highly polygenic traits, the most relevant practical implication is that improving prediction between populations depends on optimizing the composition of the training set (relatedness, LD patterns, and population structure) rather than relying on individual SNPs. This highlights that genomic selection strategies should prioritize the genetic similarity between training and target materials to maximize predictive ability in practical breeding programs.

Moreover, in practical breeding programs, where selection is performed on recombinant families with high LD, genomic selection proves especially useful for choosing starting material, even when not all germplasm is genotyped. Therefore, tailoring training and validation strategies to the specific biological and genetic context of the breeding program maximizes the potential of genomic selection in maize.

AUTHOR CONTRIBUTIONS

A. López‐Malvar: Conceptualization; data curation; investigation; writing—original draft; writing—review and editing. R. Santiago: Funding acquisition; investigation; resources; writing—review and editing. A. Butrón: Conceptualization; data curation; funding acquisition; investigation; resources; writing—review and editing. R. A. Malvar: Conceptualization; data curation; funding acquisition; investigation; project administration; resources; supervision; writing—review and editing. N. Gesteiro: Conceptualization; data curation; formal analysis; investigation; methodology; software; supervision; validation; visualization; writing—original draft; writing—review and editing.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

Supporting information

Supplementary Table 1. Best Linear Unbiased Estimates (BLUEs) of all traits evaluated in both the MAGIC population and the diversity panel.

Supplementary Table S2. GWAS association results for silking in the diversity panel (GAPIT BLINK output).

Supplementary Table S3. GWAS association results for anthesis in the diversity panel (GAPIT BLINK output).

Supplementary Table S4. Consolidated list of all significant SNPs identified in the diversity panel aligned to the B73v5 reference genome.

Supplementary Figure S1. Manhattan plots for silking and anthesis obtained from GWAS in the diversity panel.

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Albrecht, T. , Auinger, H. J. , Wimmer, V. , Ogutu, J. O. , Knaak, C. , Ouzunova, M. , Piepho, H. P. , & Schön, C. C. (2014). Genome‐based prediction of maize hybrid performance across genetic groups, testers, locations, and years. Theoretical and Applied Genetics, 127, 1375–1386. 10.1007/S 00122-014-2305-Z 24723140 · doi ↗ · pubmed ↗
2Bian, Y. , & Holland, J. B. (2017). Enhancing genomic prediction with genome‐wide association studies in multiparental maize populations. Heredity, 118, 585–593. 10.1038/HDY.2017.4 28198815 PMC 5436027 · doi ↗ · pubmed ↗
3Bradbury, P. J. , Zhang, Z. , Kroon, D. E. , Casstevens, T. M. , Ramdoss, Y. , & Buckler, E. S. (2007). TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics, 23, 2633–2635. 10.1093/BIOINFORMATICS/BTM 308 17586829 · doi ↗ · pubmed ↗
4Butrón, A. , Santiago, R. , Cao, A. , Samayoa, L. , & Malvar, R. (2019). QT Ls for resistance to fusarium ear rot in a multiparent advanced generation intercross (MAGIC) maize population. Plant Disease, 103, 897–904.30856072 10.1094/PDIS-09-18-1669-RE · doi ↗ · pubmed ↗
5Cockram, J. , & Mackay, I. (2018). Genetic mapping populations for conducting high‐resolution trait mapping in plants. Advances in Biochemical Engineering/Biotechnology, 164, 109–138. 10.1007/10_2017_48 29470600 · doi ↗ · pubmed ↗
6Crossa, J. , Pérez‐Rodríguez, P. , Cuevas, J. , Montesinos‐López, O. , Jarquín, D. , de los Campos, G. , Burgueño, J. , González‐Camacho, J. M. , Pérez‐Elizalde, S. , Beyene, Y. , Dreisigacker, S. , Singh, R. , Zhang, X. , Gowda, M. , Roorkiwal, M. , Rutkoski, J. , & Varshney, R. K. (2017). Genomic selection in plant breeding: Methods, models, and perspectives. Trends in Plant Science, 22, 961–975. 10.1016/J.TPLANTS.2017.08.011 28965742 · doi ↗ · pubmed ↗
7Dzievit, M. J. , Guo, T. , Li, X. , & Yu, J. (2021). Comprehensive analytical and empirical evaluation of genomic prediction across diverse accessions in maize. The Plant Genome, 14, e 20160. 10.1002/TPG 2.20160 34661990 PMC 12806877 · doi ↗ · pubmed ↗
8Endelman, J. B. (2011). Ridge regression and other kernels for genomic selection with R package rr BLUP. The Plant Genome, 4, 250–255. 10.3835/PLANTGENOME 2011.08.0024 · doi ↗