Predicting hybrid fitness: the effects of ploidy and complex ancestry

Hilde Schneemann; John J Welch

PMC · DOI:10.1093/genetics/iyaf242·November 8, 2025

Predicting hybrid fitness: the effects of ploidy and complex ancestry

Hilde Schneemann, John J Welch

PDF

Open Access

TL;DR

This paper introduces a model to predict hybrid fitness in plants with different levels of ploidy and ancestry, helping understand how hybridization affects adaptation and isolation.

Contribution

A novel fitness landscape model is introduced to predict hybrid fitness with arbitrary ploidy and multiple hybridizing lineages.

Findings

01

The model successfully predicts hybrid fitness in maize and rye with diploid and tetraploid hybrids.

02

The model captures dosage and genetic interactions important for heterosis and reproductive isolation.

Abstract

Hybridization between divergent populations places alleles in novel genomic contexts. This can inject adaptive variation—which is useful for breeders and conservationists—or reduce fitness, leading to reproductive isolation. Most theoretical work on hybrids involves haploid or diploid hybrids between two parental lineages, but real-world hybridization is often more complex. We introduce a simple fitness landscape model to predict hybrid fitness with arbitrary ploidy and an arbitrary number of hybridizing lineages. We test our model on published data from maize (Zea mays) and rye (Secale cereale), including hybrids between multiple inbred lines, both as diploids and synthetic tetraploids. Quantitative predictions for the effects of inbreeding, and the strength of progressive heterosis, are well supported. Results suggest that the model captures the important properties of dosage and…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species2

Zea mays Secale cereale

Figures2

Click any figure to enlarge with its caption.

Fig. 1 — Reanalysis of hybrid maize data from Yao et al. (2020). a) The decline in fitness over 7 generations of inbreeding is slower in tetraploid (▴) than in matched diploid (∙) maize crosses. Points and bars show means and standard errors for ear length, measured as deviations from the appropriate F1, then pooled over crosses, and transformed to minimize the skew around the within-generation means (see Appendix B: Details of reanalysis of Yao et al. (2020) maize data). Lines show the least-squares fit of the nonlinear model of equation (B1) (Baty et al. 2015) with β fixed at its expected value. b) Estimates of the parameter β, which captures the retention of heterozygosity, with 95% confidence intervals (Baty et al. 2015). Estimates of β for both pooled and individual crosses match predictions of β=1/2 for diploids, and β=5/6 for tetraploids (equation (6)). c) Estimates of ploidy from model comparison with K=2,4, or 6 indicate that the correct ploidy gives the best likelihood for both the pooled diploid and tetraploid data (left and right y-axis, respectively). Error bars indicate 2 units of log likelihood. d) Fitting equation (9), estimates the different effects of balanced and unbalanced heterozygotes in tetraploids, and agrees with predictions of β13/β22=3/4 (confidence intervals on the ratio used the “delta method”; Fox et al. 2001; Fox and Weisberg 2019).

Fig. 2 — Reanalysis of hybrid rye data from Lundqvist (1966). a) Predicted fitness for different crosses in diploids (∙) and tetraploids (▴), with ancestry from P=1,2,3, or 4 parental lines (equation (2)). The diagram assumes equal fitness for the parents and F1 for visual clarity, and with this assumption, ploidies differ only in their F2 breakdown (fitness reduction between F1 and F2; equation (12)). Fitness epistasis, I¯ (faint double-sided arrows), can vary in sign, and affect both F1 heterosis (improvement between Parents and F1; equation (11)) and progressive heterosis (improvement from adding parents to the F2; equation (16)). A maternal affect (upward facing red arrows) may increase the fitness of all F2, because they were grown from fitter F1 mothers. b) Slopes that indicate F1 heterosis and progressive heterosis should match (equation (16); dashed lines); while the F2 breakdown slope should be 1/3 smaller (equation (12); dotted line), but maternal effects make the latter difficult to test. c) F1 heterosis in diploids accurately predicts the progressive heterosis, in both diploids and tetraploids (equation (16)). Results show the difference in W¯ for the crosses indicated, normalized by the difference between diploid parents and F1. Error bars are jackknife resamples, removing any cross with ancestry from each of 6 inbred lines (black bars), or each of 15 pairs of lines (faint bars). d) Rye data, with total kernel yield transformed to maximize the correlation between ploidies for the Parent and F1 measurements (see Appendix C: Details of reanalysis of Lundqvist (1966) rye data), and individual crosses (∙) compared to means of cross types (+). The best-fit SMA regression (Warton et al. 2012) has a slope close to 1 (solid line), implying that polyploidization little changed the Mij and Iij parameters.

Equations16

Funding2

—Wellcome Trust10.13039/100010269
—Horizon 202010.13039/100010661

Keywords

hybridizationploidyheterosisgenetic interactions

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic Mapping and Diversity in Plants and Animals · Genetics and Plant Breeding · Genetic diversity and population structure

Full text

Introduction

Hybridization occurs when individuals from genetically differentiated lineages mate and produce offspring. These hybrids carry novel combinations of alleles, exposing allelic effects in different genomic backgrounds (Burch et al. 2024; Peñalba et al. 2024). When alleles function well in their new background, hybridization can facilitate adaptation (e.g. Song et al. 2011; Pardo-Diaz et al. 2012; Abbott et al. 2013; Hedrick 2013; Kulmuni et al. 2023)—a fact exploited by breeders (East 1909; Shull 1909; Gowen 1952; Suneson 1956; Gerdes et al. 1999; Gur and Zamir 2004; Mackay et al. 2020; ter Steeg et al. 2022), and by conservationists (Genovart 2008; Chan et al. 2019). Conversely, when alleles function poorly, low fitness hybrids may form a barrier to genetic exchange, contributing to reproductive isolation and speciation (Dobzhansky 1937; Butlin 1987; Levin 1985; Hoskin et al. 2005). Hence, studying hybrids and predicting their fitness is important in several areas of biology.

Decades of empirical work has revealed recurrent patterns in hybrid fitness data, suggestive of common and general features of genetic interactions (Kölreuters 1766; Darwin 1859; Haldane 1922; Bateson 1978; Butlin 1987; Waser 1993; Trouve et al. 1998; Turelli and Moyle 2007; Schilthuizen et al. 2011; Wei and Zhang 2018; Dagilis et al. 2019; Coyne and Orr 2004, Chapter 8). Inspired by these data, theoretical work has investigated simple fitness landscape models, to see whether they can generate the patterns observed (Orr 1995; Barton 2001; Gavrilets 2004; Fraïsse et al. 2016; Simon et al. 2018; Satokangas et al. 2020; Schneemann et al. 2024). However, most theory has two major limitations. First, predictions apply only to haploid or diploid hybrids, while empiricists often study higher ploidies (Otto and Whitton 2000; Birchler 2013; Washburn and Birchler 2014; Clo and Kolář 2022). Second, most predictions apply to hybrids between just two parental lineages, while hybrids can have more complex ancestry—whether in nature (Harvey et al. 2019; Natola et al. 2022; Dean et al. 2024), or in agriculture (Busbice and Wilsie 1966; Groose et al. 1989; Bingham 1980; Riddle and Birchler 2008; Washburn et al. 2019). For example, in the Kerguelen islands, hybrid mussels (Mytilus edulis complex) contain ancestry from multiple populations of at least three species (Fraïsse et al. 2021); while on British organic farms, hexaploid winter wheat (Triticum aestivum), is grown as Composite Cross Populations, combining 20 inbred lines (Suneson 1956; Knapp et al. 2020).

Here, we consider a class of fitness landscapes used to study two-parent haploid and diploid hybrids by Barton (2001), Chevin et al. (2014), and De Sanctis et al. (2023). We generalize this model to hybrids of arbitrary ploidy, and with ancestry from an arbitrary number of parental lines. We then compare predictions to two extraordinary datasets, from maize (Zea mays; Yao et al. 2020), and rye (Secale cereale; Lundqvist 1966). Both data sets compare diploids and synthetic tetraploids of the same genotypes, and both contain hybrids between multiple inbred lines.

Model

The aim of this work is to predict the fitness of a hybrid from a few measures of its genomic composition (Lande 1981; Hill 1982). For diploid hybrids between two parental lineages, A and B, the genomic composition can be quantified by the hybrid indices $[eqn]$ and $[eqn]$ , defined as the proportion of alleles in the hybrid that derive from each parental lineage; and the ancestry heterozygosity $[eqn]$ , defined as the proportion of loci with one allele from each lineage. To predict fitness from just these quantities, we need to assume that hybrids contain an effectively random sample of alleles from the parental populations (see also Appendix A: Derivation of main result). Even with this assumption, the predicted fitness might be an arbitrarily complex expression; this is because predictions need not be linear in $[eqn]$ and $[eqn]$ , but might also include higher-order terms in e.g. $[eqn]$ , $[eqn]$ etc. Fitness landscape models allow us to make testable predictions about the relative sizes of these terms. In particular, previous work has derived the following prediction, which contains just one second-order term, and two free parameters:

[eqn]

(Chevin et al. 2014; De Sanctis et al. 2023; Schneemann et al. 2024). In this expression, W is the fitness of the hybrid, which must be measured on a scale that is bounded above but unbounded below (e.g. the log number of expected offspring; Lynch and Walsh 1998, Chapter 10; Fraïsse et al. 2016; Schneemann et al. 2024), and $[eqn]$ and $[eqn]$ are the similarly transformed fitness of the original parental lineages in the environment where the hybrids were scored. The model also makes predictions about the parameters $[eqn]$ and $[eqn]$ . In particular, it predicts that $[eqn]$ , such that heterozygosity will always increase hybrid fitness. By contrast, $[eqn]$ can vary in sign (depending on the divergence and selection history of the parental lineages; see Chevin et al. 2014; Schneemann et al. 2020; De Sanctis et al. 2023) but is predicted to be bounded below at $[eqn]$ .

Biologically, equation (1) partitions the fitness effects of hybridization into (i) effects of the parental fitnesses, which are weighted by their contributions to hybrid ancestry; (ii) heterotic increases in fitness due to the masking effects of heterozygosity (Manna et al. 2011; Billiard et al. 2021; Schneemann et al. 2022); and (iii) fitness effects of admixture, which can be positive or negative, depending on the sign of $[eqn]$ (Slatkin and Lande 1994; Simon et al. 2018; Schneemann et al. 2020). The simplicity and inflexibility of equation (1) stem from its origin in a simple model of quadratic optimizing selection on additive quantitative traits (Lande 1976; Barton 2001). This model implies that only interactions between pairs of alleles contribute to fitness variation (Hill 1982; Martin et al. 2007). Nonetheless, it generates a rugged genotypic fitness landscape, with variable fitness dominance and epistasis (Martin et al. 2007; Manna et al. 2011; Hwang et al. 2017), and it might make successful predictions even when the phenotypic model is not taken literally (Martin 2014). A full derivation is found in Appendix A: Derivation of main result) and extensions that allow for, e.g. directional- or under-dominance are found in Schneemann et al. (2022) and De Sanctis et al. (2023).

Extension to complex ancestry and arbitrary ploidy

Equation (1) applies to hybrids between two parental lineages with diploid segregation. However, it is easily generalized with a change in notation. To see this, let us redefine $[eqn]$ as the proportion of alleles that descend from lineage A at a single locus; so that the hybrid index is its expected value across loci, now denoted $[eqn]$ . For a heterozygous locus, $[eqn]$ and $[eqn]$ are both $[eqn]$ , such that the expected heterozygosity across loci is $[eqn]$ . Finally, if we number the parental lineages from $[eqn]$ , then with $[eqn]$ equation (1) can be written as

[eqn]

In Appendix A: Derivation of main result, we show that equation (2) applies equally to hybrids containing ancestry from an arbitrary number of parental lineages, P, and with arbitrary ploidy, K.

Although K does not appear explicitly in equation (2), the equation does allow us to distinguish between two possible consequences of polyploidization. First, a change in ploidy might alter the fitnesses of the parental genotypes, $[eqn]$ , and the parameters $[eqn]$ and $[eqn]$ . In the current framework, the resulting changes could only be predicted if some property of the underlying phenotypic effects remained unchanged after polyploidization. For example, if the homozygous effects of alleles remained unchanged, then $[eqn]$ , $[eqn]$ , and $[eqn]$ would also remain unchanged. If, by contrast, the individual allelic effects were unchanged, then $[eqn]$ , $[eqn]$ , and $[eqn]$ would all scale with $[eqn]$ (see Appendix A: Derivation of main result equations (A14)–(A16)). A second set of predictions involve the measures of ancestry, $[eqn]$ , $[eqn]$ and $[eqn]$ . How these quantities change across generations will depend on the mode of segregation (e.g. disomic, tetrasomic), and thereby on the ploidy (Otto and Whitton 2000; Riddle et al. 2006; Soltis et al. 2014). We can test predictions about these changes, whatever the other effects of polyploidization.

Testing the predictions

In the remainder of the article, we test the predictions of equation (2) against the data of Yao et al. (2020) and Lundqvist (1966). While each study reported plausible correlates of vigor and reproductive success, neither measured fitness directly. This is an issue because proxies for fitness have no natural scale of measurement, and different transformations of the data will give different fits to the model (Fraïsse et al. 2016; Carlson et al. 2025). While such flexibility is sometimes useful, allowing us to choose the best-fitting transformation (Lynch and Walsh 1998, Chapter 11), here we use a more stringent approach. First, before undertaking any analyses, we chose the reported trait that seemed to us the most suitable proxy for fitness. We then transformed the trait measurements to better meet the assumptions of the statistical tests, but prior to the model fitting itself, and without assuming the adequacy of equation (2) (see below for details). Before each reanalysis, we derive the additional results required to fit the model to each data set.

Inbreeding depression with selfing

While initial F1 hybrids often see heterotic gains in fitness and vigor, subsequent inbreeding tends to reduce the gains, implying a role for heterozygosity. A major goal of Yao et al. (2020) was to explore the effects of ploidy on this loss of heterosis under inbreeding (Muller 1914; Haldane 1930), comparing diploids and tetraploids. One complication is that tetraploids—even with $[eqn]$ parents—contain two types of heterozygote: balanced (AABB) and unbalanced (AAAB or ABBB), which might contribute differently to fitness (Lundqvist 1966; Ronfort 1999; Birchler and Veitia 2012). Equation (2) assumes that all such effects can be captured by interactions between pairs of alleles, so that we only need to track the genome-wide average of $[eqn]$ . For balanced and unbalanced heterozygotes, we have $[eqn]$ and $[eqn]$ , respectively, and so

[eqn]

which is a weighted sum of both types of heterozygote. To see how $[eqn]$ changes under inbreeding, let us note its relation to Wright’s inbreeding coefficient, F (Wright 1922): the probability that two alleles chosen without replacement from the same locus, are identical by descent (i.e. descend from the same parental lineage). From this definition, it follows that

[eqn]

After t generations of selfing, the inbreeding coefficient changes at a constant rate β, via

[eqn]

(Wright 1969; Crow and Kimura 1970). To calculate β, we then assume even ploidy and random chromosome segregation during meiosis without double reduction (Gallais 2003), which yields

[eqn]

(Hardy 2015). Combining these results, we have

[eqn]

Since selfing will not change the expected mix of ancestries, $[eqn]$ and $[eqn]$ , we then obtain the final result for the total decrease in fitness

[eqn]

where β is given by equation (6), and we have used the fact that $[eqn]$ in the initial F1 hybrid.

Testing the predictions in maize

To test the prediction of equation (8), we used data from Yao et al. (2020). These data comprise hybrids of four inbred lines of maize (Zea mays lines A188, Oh43, W22, and B73), here labeled A, B, C, and D. In the main text, we consider only the A×B and C×D crosses, which were generated in both diploid and tetraploid form, via nitrous oxide gas treatment of the diploid zygotes (Kato and Birchler 2006; Yao et al. 2020). Appendix B: Details of reanalysis of Yao et al. (2020) maize data discusses attempts to interpret the AB×CD double cross and crosses only available as diploids.

Yao et al. selfed these initial F1 for seven generations (with deviations from the cross- and ploidy-specific F1 reported at generations 0, 1, 3, 5, and 7 in five replicate experiments). We chose ear length as the available trait closest to overall plant fitness or productivity. We then Box–Cox transformed the reported measurements to minimize the skewness of the data around their means for each generation (see Appendix B: Details of reanalysis of Yao et al. (2020) maize data for full details), but without assuming the adequacy of equation (8). We then fit equation (8) using standard least-squares nonlinear regression (Baty et al. 2015). The full model includes one β value per ploidy, and an $[eqn]$ value for each ploidy/cross combination. However, without raw F1 measurements, only the β values are readily interpretable. Figure 1 shows these β values estimated either from individual crosses, or from data pooled from both crosses.

Reanalysis of hybrid maize data from Yao et al. (2020). a) The decline in fitness over 7 generations of inbreeding is slower in tetraploid (▴) than in matched diploid (∙) maize crosses. Points and bars show means and standard errors for ear length, measured as deviations from the appropriate F1, then pooled over crosses, and transformed to minimize the skew around the within-generation means (see Appendix B: Details of reanalysis of Yao et al. (2020) maize data). Lines show the least-squares fit of the nonlinear model of equation (B1) (Baty et al. 2015) with β fixed at its expected value. b) Estimates of the parameter β, which captures the retention of heterozygosity, with 95% confidence intervals (Baty et al. 2015). Estimates of β for both pooled and individual crosses match predictions of β=1/2 for diploids, and β=5/6 for tetraploids (equation (6)). c) Estimates of ploidy from model comparison with K=2,4, or 6 indicate that the correct ploidy gives the best likelihood for both the pooled diploid and tetraploid data (left and right y-axis, respectively). Error bars indicate 2 units of log likelihood. d) Fitting equation (9), estimates the different effects of balanced and unbalanced heterozygotes in tetraploids, and agrees with predictions of β13/β22=3/4 (confidence intervals on the ratio used the “delta method”; Fox et al. 2001; Fox and Weisberg 2019).

Figure 1a shows results of fitting data from the two ploidies. As expected, the tetraploids lose fitness at a slower rate, matching the predictions of equation (6) that $[eqn]$ for diploids and $[eqn]$ for tetraploids (Fig. 1b). Conversely, we can directly estimate ploidy from data of this kind by comparing the model fit under different values of K (Fig. 1c). Indeed, we find that the true values $[eqn]$ and $[eqn]$ provide the best fit for the diploid and tetraploid data, respectively. With the tetraploid data, we can go further, and estimate the weights in equation (3), which determine the effects of dosage, via the relative contributions of balanced and unbalanced heterozygotes. In this case, with $[eqn]$ and $[eqn]$ , we fit the model

[eqn]

where

[eqn]

(which follow from recursions in Haldane 1930). Then, we can estimate their ratio (but not their individual values), which equation (3) predicts to be $[eqn]$ . The results, shown in Fig. 1d, demonstrate that the estimates of this ratio from the data are consistent with the prediction of our model.

Progressive heterosis with multiple parents

A major goal of Lundqvist (1966) was to explore the effects on heterosis both of ploidy and multiparent ancestry. Equation (2) assumes that these will also be predictable from ancestry measures involving pairs of alleles. Figure 2a shows the predictions from equation (2) for diploids and tetraploids up to the F2 generation. To clearly see the patterns of interest in this diagram, we assume constant fitness across ploidies for both the parents and F1, although neither is assumed in the analysis below.

Reanalysis of hybrid rye data from Lundqvist (1966). a) Predicted fitness for different crosses in diploids (∙) and tetraploids (▴), with ancestry from P=1,2,3, or 4 parental lines (equation (2)). The diagram assumes equal fitness for the parents and F1 for visual clarity, and with this assumption, ploidies differ only in their F2 breakdown (fitness reduction between F1 and F2; equation (12)). Fitness epistasis, I¯ (faint double-sided arrows), can vary in sign, and affect both F1 heterosis (improvement between Parents and F1; equation (11)) and progressive heterosis (improvement from adding parents to the F2; equation (16)). A maternal affect (upward facing red arrows) may increase the fitness of all F2, because they were grown from fitter F1 mothers. b) Slopes that indicate F1 heterosis and progressive heterosis should match (equation (16); dashed lines); while the F2 breakdown slope should be 1/3 smaller (equation (12); dotted line), but maternal effects make the latter difficult to test. c) F1 heterosis in diploids accurately predicts the progressive heterosis, in both diploids and tetraploids (equation (16)). Results show the difference in W¯ for the crosses indicated, normalized by the difference between diploid parents and F1. Error bars are jackknife resamples, removing any cross with ancestry from each of 6 inbred lines (black bars), or each of 15 pairs of lines (faint bars). d) Rye data, with total kernel yield transformed to maximize the correlation between ploidies for the Parent and F1 measurements (see Appendix C: Details of reanalysis of Lundqvist (1966) rye data), and individual crosses (∙) compared to means of cross types (+). The best-fit SMA regression (Warton et al. 2012) has a slope close to 1 (solid line), implying that polyploidization little changed the Mij and Iij parameters.

Three main patterns are evident in Fig. 2a. First, there is F1 heterosis under both ploidies: a large increase in fitness between the parents and F1. As shown by Hill (1982) this heterosis can involve both dominance and epistasis, and if we average over all F1, we have

[eqn]

where $[eqn]$ and $[eqn]$ are averages over all pairs of parental lineages. It follows that heterosis will differ among ploidies only if $[eqn]$ and $[eqn]$ also differ, and given $[eqn]$ that heterosis cannot be negative i.e. that F1 are at least as fit as their parents (Barton 2001; Fraïsse et al. 2016; Schneemann et al. 2022).

The second pattern in Fig. 2a is the loss of fitness between the F1 and the F2 with ancestry from two parental lines ( $[eqn]$ ). As we saw in the previous section, the strength of this F2 breakdown depends on ploidy because of the ploidy-dependent rate of heterozygosity loss. Using equations (6) and (8), we have

[eqn]

The final pattern in Fig. 2a is the steady increase in F2 fitness with the number of parental lineages, P. Sometimes, F2 fitness exceeds the F1, a phenomenon known as progressive heterosis (Washburn and Birchler 2014). To understand this, let us first consider the average fitness of all F2 derived from P parental lineages:

[eqn]

where the sums in brackets would be identical for any P-parent F2. The second sum is found to be $[eqn]$ , while the first sum equals the right-hand-side of equation (4). In the F2, the inbreeding coefficient is

[eqn]

where $[eqn]$ is the probability that two alleles, drawn at random from an F2 locus, derive from different F1 gametes; and $[eqn]$ ( $[eqn]$ ) is the probability that alleles drawn from the same gamete (different gametes) are identical by descent. Substituting this result into equation (4) gives us $[eqn]$ . We can now compare the fitness of the F1 and F2 generation (with $[eqn]$ ):

[eqn]

This shows that progressive heterosis is most likely to appear in four-parent F2 hybrids, where the second term disappears. Still, in diploids ( $[eqn]$ ) progressive heterosis will appear only when epistasis is positive ( $[eqn]$ ); while in tetraploids ( $[eqn]$ ) progressive heterosis will always appear, unless epistasis is very strong and negative, such that $[eqn]$ . Finally, we can relate the strength of the progressive heterosis to the F1 heterosis, by noting that

[eqn]

So for any ploidy, the fitness gain from adding an additional parent to the F2, is equal to a quarter of the original F1 heterosis. Note that the first line of equation (16)—the linear change with P—holds quite generally, because the total number of heterozygous loci changes linearly for both ploidies (Lundqvist 1966). However, the second line—which relates F1 heterosis and progressive heterosis—holds in polyploids only under our particular weighting of the different types of heterozygote, and thus presents a more specific test of our model. All three predictions (equations (11), (12), and (16)) are summarized in Fig. 2b.

Testing the predictions in rye

To test these predictions, we use a classic dataset from Lundqvist (1966), involving six inbred parental lines of diploid rye (Secale cereale; steel variety, lines 19–23 and 25), here labeled A–F. Unlike the modern data of Yao et al. (2020) and Lundqvist (1966) studied only two generations of hybridization, and reported only means for each cross; but the dataset includes not only the 6 parental lines and the $[eqn]$ possible F1, but also the $[eqn]$ possible F2. These data allow us to test the effects of complex ancestry, because the F2 had ancestry from either $[eqn]$ , 3 or 4 parents, depending on whether it was generated by selfing an F1 or crossing F1s with or without a shared parent (e.g. AB×AC or AB×CD). Moreover, we can also compare results for diploids and tetraploids, since colchicine treatment was used to generate synthetic tetraploids of the six inbred lines; and of the 141 crosses, only a single F1 was absent in tetraploid form. Out of the traits measured, we used total kernel yield as the best-available proxy for plant fitness. As we only have the means, we cannot unskew the data as before. However, we do have six parental lines and 14 F1s, such that we can now account for the fitness effect of polyploidization per se. Using only these fixed genotypes, we Box–Cox transformed the data to maximize the fit ( $[eqn]$ ) of a Standardized Major Axis (SMA) regression between the diploid and tetraploid measurements (Warton et al. 2012). We also used this best-fit regression to impute a value for the missing tetraploid F1 (see Appendix C: Details of reanalysis of Lundqvist (1966) rye data for full details). The resulting data are plotted in Fig. 2d.

The first notable result in Fig. 2d is that the best-fit regression line between the diploid and tetraploid fixed genotypes (solid line), is very close to a 1:1 line, but with a nonzero intercept (dashed line). This result is consistent with polyploidization inducing a constant fitness cost for all genotypes, but with the $[eqn]$ and $[eqn]$ parameters remaining constant—a hypothesis supported by model selection using the complete data set of 282 crosses (see Appendix C: Details of reanalysis of Lundqvist (1966) rye data for full details).

The second observation from Fig. 2d is the failure of the prediction of hybrid breakdown (equation (12)). In fact, the two-parent tetraploid F2 are generally fitter than the equivalent F1 (see also Fig. C3). Lundqvist (1966) also remarked on this surprising aspect of his data, and speculated that a strong maternal effect might have provided a fitness boost to all F2 plants (who came from high-fitness F1 mothers), relative to the F1 and parental plants (which came from low-fitness inbred mothers). Red arrows in Fig. 2a,b show how a maternal effect would affect our predictions. Note that, even if the effect applied equally across ploidies, it might still remove hybrid breakdown completely only in the tetraploids (as observed). We could not test directly for a maternal effect, since crosses were not generally made in both directions (Lundqvist 1966). However, given our other assumptions, models with a constant maternal effect were preferred (see Appendix C: Details of reanalysis of Lundqvist (1966) rye data).

Finally, in Fig. 2c, we test the prediction of equation (16). Given the apparent constancy of the $[eqn]$ and $[eqn]$ parameters across ploidies, we used the observed heterosis in diploid F1, to predict the effects of adding parents to F2 of both ploidies. While tetraploid two-parent F2 were slightly fitter than predicted, the quantitative agreement was otherwise good. For example, the strength of F1 heterosis in diploids gave a remarkably accurate prediction for the benefit of adding a fourth parent in tetraploids.

Discussion

We have presented a fitness landscape model to predict the fitness of hybrids of arbitrary ploidy and with an arbitrary mix of ancestry. Our results extend previous work on this model that applied only to two-parent haploids or diploids (Barton 2001; Chevin et al. 2014; De Sanctis et al. 2023) and allopolyploids with effectively diploid segregation (see the reanalysis of Brassica data from Hauser et al. 1998 by Schneemann et al. 2024). The model presented here applies under any mode of segregation as long as one can measure or predict the expected covariance in ancestry. Our model can also be seen as a generalization of classical single-locus inbreeding theory (Wright 1977, pp. 21–25) incorporating pairwise epistatic effects. As we discussed in previous work, these epistatic effects tend to become important incompatibilities in hybrids between divergent parents (Fraïsse et al. 2016; Simon et al. 2018; Schneemann et al. 2022; De Sanctis et al. 2023). As a result, this model can be used to study heterosis as well as reproductive isolation. The model can also be seen as a special case of autopolyploid quantitative genetics models that can be applied to any trait (e.g. Fisher 1965; Busbice and Wilsie 1966; Bennett 1976; Gallais 2003). Compared to these more general models, our focus on fitness (or proxies thereof) and assumptions of the underlying phenotypic model (i.e. quadratic fitness function and additivity of phenotypic effects) restrict the number of parameters and add an interpretation to their relative values in terms of the divergence history (De Sanctis et al. 2023). Our predictions about the masking effects of different types of heterozygote (e.g. AAAB vs. AABB) follow directly from these assumptions and gave a good fit to the data analyzed here (e.g. Figs. 1d and 2c).

Explaining heterosis

We applied our model to two published data sets (Lundqvist 1966; Yao et al. 2020), both of which show heterosis, i.e. increased fitness for early generation hybrids (Kölreuters 1766; Darwin 1859; Gowen 1952; Birchler 2013; Muraro et al. 2022). There are longstanding debates about the causes of heterosis—and especially the adequacy of the simplest theory: that the parental lines fixed recessive deleterious alleles, whose effects are masked in the hybrids (Bruce 1910; Keeble and Pellew 1910; Crow 1948; Lippman and Zamir 2007; Birchler 2013; Washburn and Birchler 2014). The same process of masking—sometimes called “the dominance theory” or “the complementation theory”—could, in principle, explain the progressive heterosis observed in polyploids with multiparent ancestry (Fig. 2a,b; Jones 1918; Stringfield 1950; Bingham 1980; Groose et al. 1989). Our fitness landscape model offers a generalization of the classical masking theory, and may help to resolve some of these debates.

For example, if heterosis is caused by masking of deleterious alleles, then we could expect to select out these alleles, thereby “fixing” the heterosis—but this has not always proven possible (Jones 1917; Vetukhiv 1954; Birchler 2003; Koltunow and Tucker 2003; Washburn et al. 2019). Here, we have shown that F1 heterosis (equation (11)), and progressive heterosis (equation (15)) can both arise via masking, even when there are epistatic fitness interactions between the parental alleles. Nevertheless, with epistasis, “deleterious” alleles might be deleterious only in some genetic backgrounds (Hwang et al. 2017; Xie et al. 2022), and this could make them difficult to purge. Thus, our model may help to reconcile the observation of heterosis arising from masking, with the difficulty of fixing heterosis through selection on hybrid populations (Birchler 2013).

More directly, the magnificent data of Yao et al. (2020) were originally reported as evidence against the masking theory, because no evidence was found that diploids and tetraploids lost fitness at different rates. Here, by contrast, we did find a difference in the predicted direction (Fig. 1b). One explanation is that the form of the relationship is nonlinear (equation (B1)), and so unlikely to be detected by fitting a standard linear model (Yao et al. 2020), especially if data are not transformed to meet the assumptions of the model fitting (see Appendix B: Details of reanalysis of Yao et al. (2020) maize data).

Nonlinearity and scales of measurement are relevant for a second debate about the masking theory. If deleterious mutations are successfully removed, then the theory predicts a reduction in the potential for heterosis, which is not always observed (Duvick 2005; Troyer and Wellin 2009). However, the quantitative change in the heterosis with selective improvement of the parents will depend on the fitness proxy and data transform used. So, for example, yield might show increasing heterosis, while log yield shows decreasing heterosis. As such, observed changes in heterosis (Duvick 2005; Troyer and Wellin 2009) are difficult to use as evidence either for or against the masking theory (Birchler 2013; Washburn et al. 2019; see Appendix D: Heterosis and parental improvement for more details). Whatever the outcome of these debates, the fitness landscape model used here, incorporating, as it does, different types of gene action (Manna et al. 2011; Sellis et al. 2011) may facilitate quantitative analysis of hybrid fitness data and aid progress towards a unifying theory of heterosis (Birchler 2013).

Polyploidization and hybridization

While both reanalyses presented here (Figs. 1 and 2) concern heterosis, we hope the theoretical results will apply more widely. If hybridization can contribute to biodiversity by injecting adaptive variation (Kulmuni et al. 2023; Peñalba et al. 2024), it can also do so by failing—thereby increasing reproductive isolation. Polyploidization may play a special role in both the positive and negative processes. For example, polyploid hybrids could have enhanced adaptive potential—not only by better masking their ancestors’ deleterious mutations (as discussed above) but also by combining their adaptations in a single individual, or by releasing pleiotropic constraints via specialization of the subgenomes (Soltis and Soltis 2000; te Beest et al. 2011; Van de Peer et al. 2017). On the other hand, ploidy differences can pose instant and strong barriers to gene exchange (Marks 1966; Otto and Whitton 2000; Turelli et al. 2001; Hegarty and Hiscock 2004; Soltis et al. 2014) and can themselves be caused by hybridization (Masterson 1994; Lewis 2012). Indeed, these are all possible explanations for the abundance of polyploids in nature.

In our approach, the positive and negative effects of hybridization are governed by the size and magnitude of the parameters (i.e. the $[eqn]$ and $[eqn]$ ), and the underlying phenotypic approach allows us to predict their changes under different modes of evolutionary divergence (Chevin et al. 2014; Simon et al. 2018; Schneemann et al. 2020; De Sanctis et al. 2023; Schneemann et al. 2024).

The phenotypic model also makes a naive prediction about how these quantities could change with polyploidization itself, and indeed the rye data suggested that similar $[eqn]$ and $[eqn]$ applied for both ploidies, as would be the case if homozygous phenotypic effects were unaffected by polyploidization. Of course, this result is only suggestive, and it seems unlikely that any simple fitness landscape model could capture all the diverse morphological, cytological, and genetic effects of polypoidization, including genome instability (Otto and Whitton 2000; Riddle et al. 2006; Soltis et al. 2014). Still, we hope that our predictions for the masking and dosage effects in polyploids will help to tease these factors apart.

Bibliography116

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abbott R et al 2013. Hybridization and speciation. J Evol Biol. 26:229–246. 10.1111/jeb.2013.26.issue-2.23323997 · doi ↗ · pubmed ↗
2Barton NH . 2001. The role of hybridization in evolution. Mol Ecol. 10:551–568. 10.1046/j.1365-294x.2001.01216.x.11298968 · doi ↗ · pubmed ↗
3Bateson P . 1978. Sexual imprinting and optimal outbreeding. Nature. 273:659–660. 10.1038/273659 a 0.661972 · doi ↗ · pubmed ↗
4Baty F et al 2015. A toolbox for nonlinear regression in R: the package nlstools. J Stat Softw. 66:1–21. 10.18637/jss.v 066.i 05. · doi ↗
5Bennett JH . 1976. Expectations for inbreeding depression on self-fertilization of tetraploids. Biometrics. 32:449. 10.2307/2529514.953141 · doi ↗ · pubmed ↗
6Billiard S, Castric V, Llaurens V. 2021. The integrative biology of genetic dominance. Biol Rev Camb Philos Soc. 96:2925–2942. 10.1111/brv.v 96.6.34382317 PMC 9292577 · doi ↗ · pubmed ↗
7Bingham E . 1980. Maximizing heterozygosity in autopolyploids. In: Lewis W.H., editor. Polyploidy. New York: Plenum Press. p. 471–489.
8Birchler JA . 2003. In search of the molecular basis of heterosis. Plant Cell. 15:2236–2239. 10.1105/tpc.151030.14523245 PMC 540269 · doi ↗ · pubmed ↗