The genetic architecture of temperature-induced partial fertility restoration in A1 cytoplasm in sorghum (Sorghum bicolor (L.) Moench)
D. R. Jordan, R. R. Klein, J. Melonek, I. Small, A. Cruickshank, L. Bradburn, S. Malory, Y. Tao, A. Hathorn, C. H. Hunt, L. T. Amenu, E. S. Mace

TL;DR
This study identifies multiple genes responsible for temperature-induced partial fertility in male-sterile sorghum, which affects hybrid seed production.
Contribution
The study reveals that partial fertility in CMS sorghum is controlled by multiple distinct genes, not previously known fertility restoration genes.
Findings
43 significant SNPs were identified as being associated with partial fertility in CMS sorghum.
Partial fertility is controlled by multiple genes distinct from known fertility restoration genes.
Reduced genetic diversity in elite female sorghum lines is linked to partial fertility, not fertility restoration gene frequency.
Abstract
High-temperature-induced partial fertility in CMS sorghum is controlled by multiple genes that are distinct from genes involved in fertility restoration, contributing to reduced diversity in elite females. Cytoplasmic male sterility (CMS) is used for commercial production of hybrid seed in sorghum. CMS-based hybrid breeding systems require female parental lines (CMS lines) to remain male sterile to prevent self-pollination and enable cross-pollination to generate hybrid seed. However, genetic and environmental factors can lead to the loss of male sterility in the pollen-accepting female parent, resulting in the production of contaminating non-hybrid seeds through self-fertilization with large economic consequences. It is known that high temperatures around flowering time induce sterility breakdown, or partial fertility; however, the genetic control of this phenomenon is poorly…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —http://dx.doi.org/10.13039/501100000980Grains Research and Development Corporation
- —http://dx.doi.org/10.13039/501100000923Australian Research Council
- —The University of Queensland
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotosynthetic Processes and Mechanisms · Plant Reproductive Biology · Genetic Mapping and Diversity in Plants and Animals
Introduction
Cytoplasmic male sterility (CMS) systems are used for commercial production of hybrid seed in a range of crops including maize, sunflowers, canola and sorghum (Bohra et al. 2016). CMS was first reported in sorghum by Stephens and Holland (1954) with the first commercial sorghum hybrids being grown in the early 1960s. In sorghum, CMS is operationalized using a three-line system often described using the three-letter coding system A/B and R where different combinations of nuclear and cytoplasmic genes result in sterility or fertility (Kim and Zhang 2018). Cost-effective production of F1 seed is achieved by growing male sterile “A” lines or female parents unable to produce functional pollen in isolated crossing blocks with male fertile “R” lines. Sterility in “A” lines is the result of mitochondrial genes in the maternally inherited cytoplasm that prevent the formation of viable pollen (CMS) (Hanson and Bentolila 2004). New female parents are developed as “B” lines which have a cytoplasm that does not contain the sterility-inducing mitochondrial genes. These B lines are then converted to male sterile “A” line versions via backcrossing to a source of sterile cytoplasm. The A/B pairs then share the same nuclear genome but have different cytoplasms. The restorer or “R” lines are used as pollen parents in hybrid production blocks and carry dominant nuclear restorer genes that counteract the impacts of the mitochondrial genes in the “A” line cytoplasm allowing production of male fertile F1 hybrids (Kim and Zhang 2018).
At the molecular level, it is thought that CMS in plants involves the expression of mitochondrial genes that produce gene products that interfere with normal pollen production (Chase 2007, Fishman and Sweigart 2018; Kim and Zhang 2018). Unfortunately, the identification of these sterility-inducing genes and their mechanisms has proven to be challenging due to the complexity of mitochondrial genome organization (Kim and Zhang 2018; Bohra et al. 2016; Kazama et al. 2019). However, in recent years advances in sequencing techniques enabled identification of multiple CMS-causing genes in more than a dozen crop species including rice, Brassicas, maize and wheat (Kim and Zhang 2018, Melonek et al. 2019). The majority of these genes show a chimeric structure made of conserved and new DNA sequences formed through multiple rearrangements of mitochondrial sequences, as well as via substoichiometric shifting and sequence changes (Hanson and Bentolila 2004; Kim and Zhang 2018). In rice, the mitochondrial gene WA352 has been shown to produce a protein that inhibits the nuclear-encoded mitochondrial protein COX11, thus modifying peroxide metabolism in the tapetum triggering premature tapetal programmed cell death and consequent pollen abortion (Luo et al. 2013). In contrast, the identification of the nuclear genes responsible for fertility restoration has proven to be relatively straightforward, with Restorer-of-fertility (Rf) genes being identified in a range of species including maize (Cui et al. 1996), rice (Ahmadikhah and Karlov 2006; Itabashi et al. 2011; Komori et al. 2004), sorghum (Jordan et al. 2010, 2011; Praveen et al. 2015) and wheat (Melonek et al. 2019). The cloned Rf genes belong overwhelmingly to the pentatricopeptide repeat (PPR) gene family (Praveen et al. 2018; Kim and Zhang 2018). PPR genes are a large family of RNA-binding proteins which regulate several aspects of gene expression in organelles including splicing, editing, RNA stabilization and cleavage (Barkan and Small 2014). The subset of the PPR genes involved in fertility restoration in crops, referred to as Rf-like, are usually located in genomic clusters and show high similarity to each other (Gaborieau et al. 2016; Geddy and Brown 2007). Pangenome-level analyses of clusters of the Rf-like clade show extreme variation in structure and gene content within and across species (Melonek et al. 2016, Melonek et al. 2019, Walkowiak et al. 2020).
In sorghum, fertility restoration is controlled by a relatively small number of major restorer genes including Rf1 (Klein et al. 2001), Rf2 (Jordan et al. 2010; Madugula et al. 2018), Rf5 (Jordan et al. 2011; Kiyosawa et al. 2022), and Rf6 (Praveen et al. 2015). In addition to genetic factors, fertility in cytoplasmic male sterility systems in sorghum was reported to be sensitive to environmental variables such as temperature (Brooking 1976) and photoperiod (Batch and Morgan 1974).
Although it has been rarely reported in the literature, partial fertility is widely observed in sorghum crosses particularly in crosses between fertile and sterile parent lines (Jordan et al. 2011). This partial fertility appears to be the result of minor effect genes such as those identified by Jordan et al. (2011), Maunder and Pickett (1959) and Miller and Pickett (1964) which give rise to a continuum of full and partial fertility observed in the progeny of F1 hybrids (Jordan et al. 2011). Partial fertility in sorghum is temperature sensitive with expression of fertility increasing when high temperatures occur around flowering (unpublished data).
In applied hybrid sorghum breeding programmes, genes for partial fertility must be excluded from female parent lines because the production of pollen by female parents leads to self-pollination and the presence of inbred lines in hybrid seed, rendering it unsaleable. In practice, this can be difficult to achieve because the partial fertility phenotype can only be observed when the nuclear genome of new female parents is backcrossed into sterile cytoplasm (Jordan et al. 2010). The situation is further complicated because expression of partial fertility increases as the proportion of recurrent parent genome increases during backcrossing (unpublished observation), requiring substantial investment in the cytoplasmic conversion of lines before the breeder can be confident that they are not likely to become partially fertile. A final complication results from the fact that the expression of partial fertility varies with environmental conditions and is only expressed when high temperatures occur at critical times around flowering. As a result, lines carrying genes for partial fertility may not be detected if the lines are not exposed to the relevant conditions at the critical development phase (Jordan et al. 2010). Little is known about the genetic control of partial fertility and to date only a single QTL for partial fertility has been mapped (Jordan et al. 2011). However, the nature of the trait, particularly its interaction with the environment and its dosage dependence, suggests a more complex inheritance compared with the major restorer genes. One practical implication of this complex genetic control is reluctance on the part of breeders to make genetically diverse crosses when developing new parental lines due to the frequency of partial fertility in these crosses (Jordan et al. 2010).
In this study we used association mapping on a large panel of CMS parent lines to identify QTL associated with the genetic control of partial fertility in sorghum and investigate its impact on hybrid breeding.
Materials and methods
Genetic material
Germplasm set 1 diverse A lines for association mapping
A total of 2049 female parent lines in A1 cytoplasm were grown in a trial comprising 59 lines from the Nuseed breeding programme and 1990 lines from the UQ/DAF/GRDC pre-breeding programme. The Nuseed lines consisted of a sample of germplasm from the commercial breeding programme known to vary in their expression of partial fertility. The material from the UQ/DAF/GRDC breeding programme consisted of a sample of active and historical hybrid parent lines as well as a set of new parent lines that were being developed by the programme. The UQ/DAF/GRDC parent lines shared some degree of co-ancestry.
Germplasm set 2 elite A/B and R lines for diversity analysis
Germplasm set 2 consisted of 2219 advanced A/B lines and 2135 R lines from the UQ/DAF/GRDC sorghum breeding programme that were active in final stage testing during the last 5 years. These lines were the result of multiple cycles of crossing and selection for performance in hybrid combination since hybrid breeding commenced in the mid-1960s. During this time, the programme has been managed in a way that is analogous to the early parental development phase of medium size commercial breeding programme. All the A/B lines in this set were either also in set 1 or were first order relatives of the lines in set 1. The R lines (male parents) in set 2 have been strongly selected for capacity to produce fully fertile F1 hybrids in combination with male sterile female parents in A1 cytoplasm, whereas the B lines (female parents) have been selected to exhibit acceptable levels of male sterility in A1 cytoplasm. Investment in female and male populations as measured by number of crosses made for each pool has been approximately equivalent during the last 30 years (unpublished data).
Field trial design
A total of six field trials were planted at Emerald Research Station in Queensland (Lat. −23.528767, Long. 148.212717) over 4 years from 2013 to 2016 (Table 1). The trials were planted from early October to late December to ensure that flowering would occur during the hottest part of the year. For the years in question, average maximum and minimum temperatures around the flowering period (November–February) were 35.4 °C and 21 °C, respectively, with the hottest days in these 4 months in the different years varying between a low of 37.2 °C to a high of 44.4 °C. Standard agronomic management and weed control for sorghum was used and irrigation applied when required by the crop to avoid water stress. Each plot consisted of a single 5 m row planted at 1 m row spacing. All trials used a partially replicated design (Cullis et al. 2006) with test entries replicated ~ 1.5 times and check entries replicated between 2 and 10 times. The number of entries in the trials varied between 593 and 803 genotypes and the number of plots varied between 960 and 1288 (Table 1). The concurrence of genotypes across the six trials and 4 years was sufficient for the trials to be analysed as a single multi-environment experiment (ESM Table S1). A customized row column design was used to minimize spatial error effects within each trial. Table 1. Planting dates and composition of trials used in the studyTrial namePlanting dateNumber of genotypesPlotsObservationsMeasured daysEmer14033/10/201359396019034Emer140418/10/201360396028095Emer150518/11/2014780120032446Emer15061/12/2014783120023066Emer160723/12/20158031288542210Emer170423/11/2016682100043118Indicates the total number of days the partial fertility trait was measured in each trial
Phenotyping
Sorghum heads flower in segments commencing at the top of the panicle and progressing downwards over a period of approximately 5 days. Freshly flowered anthers in sorghum only dehisce (release pollen) on the day they emerge and can be distinguished from the anthers that dehisced on the previous day by their colour. On each day of data collection, the freshly flowered portion of any heads flowering in a plot were visually rated using a 1–9 rating previously described by Jordan et al. (2011). This scale is based on the size, colour and morphology of the fresh anthers (see detailed description below) and was shown to be strongly correlated with pollen production and seed set (Jordan et al. 2011). Visible pollen and seed production commenced when anthers had a minimum rating of 6. Unpublished data had previously shown that lines with ratings of 4 or 5 under normal temperature conditions typically produce pollen when exposed to high temperatures at flowering time. On any day only a proportion of genotypes were flowering; however, the overlap of flowering periods of the genotypes enabled data from different days and trials to be analysed as a single multi-environment trial. In total, data were recorded on a combined total of 36 separate days over the six trials with between 1903 and 5422 plot × day observations being recorded for each trial giving a total of 19,995 plot × day observations across all trials. This meant that each plot in a trial was scored between 2 and 4 times on average.
Scoring system for anther sterility
A visual scoring system for sterility was previously developed by Jordan et al. (2011) and consisted of nine ratings. Score 1 = sterile very small colourless anthers; dehiscence pore absent; Score 2 = sterile small colourless anthers; dehiscence pore absent; Score 3 = sterile small but slightly larger again anthers, may be more coloured; dehiscence pore absent; Score 4 = sterile small, but slightly larger again, more coloured anthers; dehiscence pore absent; Score 5 = sterile medium anthers, coloured; dehiscence pore absent; Score 6 = partially fertile larger coloured anthers; dehiscence pore present in some anthers, pollen present in very small quantities; Score 7 = partially fertile, larger coloured anthers; dehiscence present in most anthers, some visible pollen; Score 8 = moderately plump coloured anthers; dehiscence pore present in all anthers and pollen present; and Score 9 = plump coloured anthers; dehiscence pore present copious amounts of pollen.
Genotyping and imputation
DNA was extracted from bulked young leaves of five plants of each genotype in the two populations using a previously described CTAB method (Doyle and Doyle 1987). The two germplasm sets were genotyped using medium genome-wide SNPs provided by Diversity Arrays Technology Pty Ltd (www.diversityarrays.com), which involves complexity reduction of the genomic DNA to remove repetitive sequences using methylation sensitive restriction enzymes prior to sequencing on Next Generation sequencing platforms. The sequence data generated were then aligned to version 3.1.1 of the sorghum reference genome sequence (McCormick et al. 2018) downloaded from the Phytozome 13 website (https://phytozome-next.jgi.doe.gov/) to identify single nucleotide polymorphisms (SNPs).
In total, 21,886 polymorphic SNPs were identified on the sets of lines. The overall proportion of missing data reported in the raw genotypic data sets was approximately 13%. Individual SNP markers with > 40% missing data were removed from further analysis and the remaining missing values were imputed using Beagle v5.0 (Browning and Browning 2016; Browning et al. 2018). An average imputation accuracy of 96% was achieved across both populations.
Statistical analysis
Data from multiple trials measured over multiple days were combined into a Multi-Environment Trial (MET). The fertility scores were analysed using a linear mixed model which contained both fixed and random effects. Fixed effects included a mean value for each trial and a mean value for each date within each trial. Trials were assessed for possible spatial field effects, and spatial terms were included for each trial, as necessary. These random spatial effects included auto-regression correlations for rows and columns for each trial correlated across dates.
Correlations between dates and trials were examined by fitting genotype effects as random with correlation between each trial/date combination. These results showed very high genetic correlations between trials and dates which allowed the inclusion of a fixed genotype main effect. By fixing all the non-genetic random effects in the model, Best Linear Unbiased Estimates (BLUEs) were calculated that were representative of genotype performance across all trials and all dates. These BLUEs were used in the subsequent genome-wide association study (GWAS). All statistical analyses were conducted with the R software (www.R-project.org) using the package ASReml-R version 3 (Butler et al. 2009).
Association mapping
The imputed SNP data set for germplasm set 1 was further filtered on MAF > 0.001, resulting in a final set of 18,638 SNPs used for GWAS. The GWAS analysis was conducted using BLINK (Huang et al. 2019), in which population structure was controlled using the first three principal components generated from a principal component analysis of SNP data. The GEC software package (Li et al. 2012) was used to calculate the number of independent tests within the SNP data. Based on this number, the Bonferroni correction was applied to set significant threshold for the association analysis at 5.78E−6.
Diversity analysis of B and R line pools
A principal component analysis was conducted on germplasm set 2 which consisted of elite female (B) and male (R) lines from the UQ/DAF/GRDC pre-breeding programme. Nucleotide diversity was estimated for each SNP by calculating the average number of pairwise differences between sequences (π). Average π values across linkage groups and the whole genome were computed for the B and R groups using vcftools (Danecek et al. 2011). Linkage disequilibrium (LD) was estimated for each pair of SNPs by calculating the correlation coefficient, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r^{2}$$\end{document} . The \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r^{2}$$\end{document} values for the B and R groups were computed using PopLDdecay (Zhang et al. 2019), and their LD decay curves were compared with those of the Australian sorghum diversity panel (Tao et al. 2020).
Candidate genes
The location of SNPs significantly associated with partial fertility was compared with the genomic locations of four previously cloned fertility restorer genes, Rf1, Rf2, Rf5 and Rf6 (Jordan et al. 2010, 2011; Klein et al. 2001; Praveen et al. 2015; Madugula et al. 2018; Kiyosawa et al. 2022), and against a set of 63 candidate genes identified by Dhaka et al. (2020) that were selected as strong candidates for engineering male fertility in sorghum based on variable expression during anther development in sorghum and/or their rice orthologs have been experimentally shown to play role in anther development.
Results
Average pairwise correlations of sterility scores for the same genotypes scored on different days were high, varying from 0.62 to 0.99 with an average of 0.78 indicating relatively low levels of crossover GxE interactions. This enabled the data from the six experiments and 36 days to be analysed jointly to produce a single best linear unbiased estimate (BLUE) for each line used for GWAS. The repeatability of the across site estimate was high at 0.8 indicating that genetic effects explain most of the variation in the trait in these experiments and that crossover GxE was limited. This is further supported by Fig. 1 which shows an estimate of sterility rating for each day for two “A” lines (ATx623 and A_B992422), which were used as checks across all the experiments, plotted against the estimate of the average sterility rating for a particular day for all genotypes plotted in order of ascending value. These two lines are phenotypic extremes. ATx623 is a highly sterile genotype which does not break down whereas A_B992422 is known to become fertile in response to high temperatures around flowering. Both lines show relatively similar linear positive responses in sterility rating as the overall sterility rating of the day increases indicating that crossover GxE interaction is limited.Fig. 1BLUEs of daily sterility ratings for two CMS lines displaying contrasting responses to high temperatures during flowering plotted against estimates of daily sterility rating for the entire dataset. ATx623 is a highly sterile female that does not become fertile under high temperatures, and A_B992422 is known to become fertile in response to high temperatures around flowering
A histogram showing the distribution of the overall BLUEs of sterility ratings for the 2049 CMS lines is presented in Fig. 2. The distribution of phenotypic values was relatively normally distributed and ranged from 2.2 to 8.6 with an average of 4.6.Fig. 2. Histogram of phenotypic values of overall sterility ratings for 2049 CMS lines
Association mapping
An association analysis was undertaken with BLINK (Huang et al. 2019) and detected 43 significant SNPs, above the adjusted Bonferroni significance threshold (Fig. 3 and Table 2). The significant SNPs were distributed across all chromosomes, with chromosomes 2 and 5 containing the highest number (7 and 8, respectively) and chromosome 7 the least (with only 1 significant SNP identified). The cM locations of the significant SNPs were predicted from the sorghum consensus genetic linkage map (Mace and Jordan 2011). Over 20% of the significant SNPs co-located within a 2 cM window with candidate genes identified from either previous studies on major fertility restoration genes in sorghum (Rf5 candidate gene on SBI-05) or genes under differential expression during anther formation identified in Dhaka et al. (2020) (see Table 2). Two QTLs, PfQT_5.2 and PfQT_5.3, were located within the RFL/PPR gene cluster on chromosome SBI-05 in close proximity to the region delimiting the Rf5 locus (Fig. 4).Fig. 3GWAS of sterility ratings, with significant markers above the adjusted Bonferroni threshold highlighted in red (a) and associated QQ plot (b)Table 2. Summary of the partial fertility QTL identified and positional candidate genes relative to positions of mapped major restorer of fertility genesQTL NameChrbpPredicted cM* − LOG10PCandidate genes within 2 cMPfQT_1.119,834,03625.77.05OsINV4 (Sobic.001G099700)^ϯ^PfQT_1.2114,774,61338.47.61PfQT_1.3154,119,66560.510.21PfQT_1.4168,289,390125.718.10ZEP1 (Sobic.001G415800)^ϯ^PfQT_2.1255,667,6938111.55PfQT_2.2256,293,04583.87.02PfQT_2.3260,652,519123.16.56PfQT_2.4263,508,16314111.78PfQT_2.5268,226,076155.27.47PfQT_2.6271,527,180176.123.27OIP30 (Sobic.002G353500)^ϯ^PfQT_2.7275,422,068192.510.38PfQT_3.1352,261,09460.46.19OsMST8 (Sobic.003G192200)^ϯ^PfQT_3.2353,198,37664.87.73PfQT_3.3356,759,28983.67.21PfQT_3.4368,112,264139.26.77MID1 (Sobic.003G365600)^ϯ^, OsNek5 (Sobic.003G366300)^ϯ^PfQT_3.5370,705,744151.68.51PfQT_4.141,779,83914.28.61PfQT_4.242,801,43020.57.53PfQT_4.3410,204,65562.86.99OsPCBP (Sobic.004G100900)^ϯ^, HEI10 (Sobic.004G101500)^ϯ^PfQT_4.4457,829,750106.97.73PfQT_4.5460,854,468110.623.52PfQT_5.15453,977311.53PfQT_5.252,301,60417.310.28Rf5 (Sobic.005G027840)PfQT_5.353,084,14522.78.47PfQT_5.4563,658,05474.16.64PfQT_5.5567,028,867935.40PfQT_5.6567,573,37869.56.96PfQT_5.7568,964,62597.69.10PfQT_5.8571,712,867118.56.16PfQT_6.162,144,46524.331.52PfQT_6.2651,899,891104.49.12OsPLIM2b (Sobic.006G159000)^ϯ^PfQT_6.3654,298,205128.76.59OsCDPK7 (Sobic.006G192500)^ϯ^, HTH1 (Sobic.006G185600)^ϯ^PfQT_7.178,301,06366.35.60OsZIP4 (Sobic.007G075700)^ϯ^PfQT_8.181,206,98814.88.73PfQT_8.2859,829,27267.15.78PfQT_8.3860,449,286103.86.98PfQT_9.191,182,45714.29.46RPA1c (Sobic.009G012600)^ϯ^PfQT_9.294,762,73153.47.04PfQT_9.3952,687,515105.211.48OsMSH4 (Sobic.009G185700)^ϯ^PfQT_10.1103,827,43233.88.78PfQT_10.2107,264,82737.96.06PfQT_10.31051,534,73667.45.83PfQT_10.41056,987,81089.78.81*Based on the sorghum consensus genetic linkage map (Mace and Jordan 2011)^ϯ^Based on the set of 63 candidate genes identified byDhaka et al. (2020)Fig. 4. Co-localization of QTLs PfQT-5.2 and 5.3 with the Rf5 locus on SBI-05
The Rf-like PPR genes in this region are shown as red triangles, and their names are given above the triangles; other non-Rf-like genes are shown as blue triangles. The two markers delimiting the Rf5 locus (Jordan et al. 2011) are indicated.
Principal component analysis, relative diversity and linkage disequilibrium
Figure 5 shows the principal component analysis for germplasm set 2. The female and male parents clearly cluster into two separate groups on the first principal component, which explains ~ 20% of the variance.Fig. 5. Principal component analysis of elite male and female inbreed lines from the UQ/DAF/GRDC sorghum breeding programme, specifically 2219 advanced A/B lines (in red) and 2135 R lines (in blue) (colour figure online)
Table 3 shows the nucleotide diversity (π) within the A/B and R line pools from the UQ/DAF/GRDC breeding programme. Average diversity within the female parent pool at 0.10 was ~ 23% lower than that of the male parent pool (0.12). Although relative diversity varied between chromosomes, in all but one chromosome (where the diversity values were the same) the R lines had higher diversity. The diversity of the A/B lines was not reduced relative to R lines in linkage groups that contained fertility restoration genes. Table 3. Nucleotide diversity (π) of elite A/B and R lines from the UQ/DAF/GRDC breeding programme with the presence of Rf genes indicatedChromosomeπ Rπ Bπ % differenceRf genes on LG10.120.073820.130.1116Rf230.150.112640.120.1016Pf150.120.1015Rf560.140.103170.120.073580.120.120Rf190.120.0737100.110.1011Average0.120.1023**Pf1 is a partial fertility gene mapped byJordan et al (2011)
Figure 6 shows a plot of LD decay determined by squared correlations of allele frequencies (R^2^) against distance between polymorphic sites in the elite A/B and R lines from the breeding programme contrasted with the Australian sorghum diversity panel (Tao et al. 2020). The LD decays rapidly in the diversity panel, dropping to R^2^ < 0.1 at ~ 20 kb, whereas LD did not decay to R^2^ < 0.1 until ~ 200 kb in the male parental pool, and > 600 kb in the female parental pool. Specifically, at the distance of 250 kb (which corresponds to a genetic distance of ~ 1 cM in the euchromatin (Mace et al. 2012), LD in the diversity panel is close to zero, whereas it is 0.09 in the elite R lines and almost double that Fig. (0.18) in the A/B lines.Fig. 6LD decay determined by squared correlations of allele frequencies (r.^2^) against distance between polymorphic sites in A/B (Female), R (Male) and a diversity panel (described previously in Tao et al. 2020)
Discussion
In this paper we explored the previously undocumented genetic architecture of partial fertility in cytoplasmic male sterile lines of sorghum. High temperatures around flowering can induce partial fertility in cytoplasmic male sterile lines. Partial fertility in a commercial seed field causes a major problem for commercial seed companies potentially rendering hybrid seed unsaleable. While the genetic control of fertility restoration is well known and controlled by a relatively small number of major genes (Jordan et al. 2010, 2011), the control of partial fertility is poorly understood. In this study, we used GWAS on a panel of 2049 sorghum lines grown in six environments to identify 43 QTL for partial fertility and explore the genetic networks underlying the trait and the likely role of the trait in constraining genetic gain in commercial breeding programmes.
Partial fertility is controlled by many minor genes and major restoration genes do not make a major contribution to the trait
A set of 2049 cytoplasmic male sterile lines were grown in six field trials at Emerald QLD such that flowering corresponded with the hottest part of the year, with temperatures with average maximum and average minimum temperatures around the flowering period were 35.4 °C and 21 °C, respectively. Freshly flowered segments of heads were rated on 36 separate days generating ~ 20,000 plot × day observations. Considerable variation in partial fertility was observed among CMS lines. However, pairwise correlations of sterility scores across days were high as was the broad sense heritability of the trait across trials (0.8). GWAS conducted on an across site BLUEs of sterility identified 43 significant marker trait associations indicating that partial fertility is a quantitative trait of moderate complexity. One of the potential explanations of partial fertility is that it is caused by subfunctional Rf-like PPR genes. Our evidence indicates this is not likely to be the case and that the situation is more complex. The number of QTL detected for this trait (43) is much larger than the number of known fertility restoration genes and only one of these four major fertility restoration genes, Rf5 (Jordan et al. 2011), located within 2 cM of a GWAS hit. Furthermore, the regions surrounding the GWAS hits were not found to be enriched for Rf-like PPR genes as would be expected if partial fertility were the result of subfunctional Rf genes.
Candidate gene analysis suggests a range of gene networks are important in partial fertility
Dhaka et al. (2020) conducted an expression analysis study of different stages of sorghum anthers and combined this with information from rice to identify a list of candidates for engineering male fertility in sorghum. Given that partial fertility occurs in response to environmental conditions at flowering, it seems likely that some genes identified as being differentially expressed in anthers and associated with fertility in rice will be associated with the partial fertility phenotype. A comparison of the location of the significant SNPs from GWAS with the location of the genes identified in Dhaka et al. (2020) identified ~ 25% overlap, with 15 out of the 63 genes co-locating within a 2 cM window of the significant SNPs from this study (Table 2). The function of these genes may provide some indications of the gene networks involved in partial fertility.
One of the genes identified by Dhaka et al. (2020) was Sobic.006G159000, an orthologue of rice OsPLIM2b, which is directly implicated in cytoplasmic male sterility in rice. In rice, OsPLIM2b was shown to interact with the cytoplasmic male sterility-related protein kinase, OsNek3, and the transcripts of both genes were found to be preferentially expressed in anthers in bi- to tri-cellular pollen (Fujii et al. 2009). Although OsNek3 was not close to a significant SNP from GWAS in this study, the closely related OsNEK5 was less than 1 cM from a significant SNP.
Three candidate genes for sterility, Sobic.003G192200 (rice OsMST8), Sobic.001G099700 (rice OsINV4) and Sobic.004G100900 (rice OsPCBP) implicated in carbohydrate metabolism in the anthers, fall within 2 cM of significant SNPs from GWAS. In rice, OsINV4, an anther-specific cell wall acid invertase gene, and OsMST8, an anther-specific monosaccharide transporter, are downregulated by cold, resulting in pollen sterility due to interference in starch storage (Mamun et al. 2006). In addition, OsPCBP is a pollen expressed gene in rice that encodes a calmodulin-binding protein involved in calcium signalling and localized to amyloplasts (Zhang et al. 2012). Transformation experiments indicate that disruption of this gene causes failure of pollen development, likely through disruption of starch accumulation (Zhang et al. 2012).
A further four candidate genes for sterility, based on function in rice and Arabidopsis, Sobic.001G415800 (rice ZEP1), Sobic.004G101500 (rice HEI10), Sobic.002G353500 (rice OIP30) and Sobic.009G012600 (rice RPA1c), are implicated in meiosis and DNA replication. In rice, ZEP1 is critical for controlling crossovers during meiosis (Wang et al. 2010). Its function is closely linked to that of HEI10 whose immunolocalization signals always overlap with ZEP1 signals (Wang et al. 2012). RPA1c is involved in regulating crossover formation and DNA repair in rice. It is one of the subunits of Replication protein A (RPA), a heterotrimeric protein complex that binds single-stranded DNA. In plants, multiple genes encode the three RPA subunits (RPA1, RPA2 and RPA3), and in combination with the partially sterile RPA1a, RPA1c has been demonstrated to result in sterility in Arabidopsis (Aklilu et al. 2013). Finally, OIP30 is a helicase A class of enzyme that may be a potential substrate for the pollen predominant OsCPK25/26 in rice (Wang et al. 2011).
A further two candidate genes from the gene set identified by Dhaka et al. (2020) that fell within 2 cM of a significant hit from GWAS in the current study were related to other physiological functions in the rice anthers. Sobic.003G365600 (rice MID1/OsARM1) is a transcriptional regulator that promotes rice male development under drought by modulating the expressions of drought-related and anther developmental genes (Wang et al. 2017). Sobic.006G185600 (rice HTH1) is highly expressed in the epidermis of the anther in rice where it is involved in anther cutin biosynthesis and is required for pollen fertility in rice. Its reduced expression results in abortion due to a collapsed anther wall (Xu et al. 2017).
Partial fertility, rather than the frequency of restorer genes, imposes constraints on the genetic diversity of female parents in hybrid breeding programmes
The genetic diversity of the female, or A/B line, populations of hybrid breeding programmes are low relative to the male or R line parents (Crozier et al. 2020) as demonstrated by the breeding populations in this study. Linkage disequilibrium (LD) decays much more slowly in female parents than male parents from germplasm set 2, and as expected both decay more slowly than in a sample of global sorghum diversity (Tao et al. 2020). In the diversity set R^2^ declines to zero at ~ 250 kb, while at the same distance R^2^ in the female population is ~ 0.2, double that of the male population (~ 0.1). The extent of LD in a population is the result of the complex interplay of factors such as selection, admixture, linkage and genetic drift. Typically, populations with small effective population size (Ne) experience more genetic drift than larger populations with closely linked loci indicating population sizes over the historical past, while loosely linked loci signify Ne in the immediate past (Hayes et al. 2003; Hill and Robertson 1966). The divergence in LD identified between the parental populations, at both close and loosely linked loci, suggests that Ne has been low in female, B, lines compared with male, R, lines in both the recent and historical past.
It is often stated that the major reason for the low genetic diversity within female parent lines is the fact that most sorghum landraces and germplasm lines are restorers of cytoplasmic male sterility (Menz et al. 2004). However, given that restoration of A1 cytoplasm is under the control of a small number of major genes (Jordan et al. 2010, 2011), it would seem to be relatively easy to remove these genes via phenotypic selection in test crosses or more recently by selection with molecular markers. This conjecture is further strengthened by the observation that the genetic diversity in B line material was not significantly lower than R lines for linkage groups containing restorer genes compared with linkage groups that did not. It seems unlikely therefore that frequency of restorer genes is sufficient to explain the observed differences in genetic diversity between the male and female parental pools. We propose that the difference in genetic diversity between the pools is primarily driven by the genetic architecture of partial fertility. The large number of loci influencing partial fertility identified in this study coupled with their environmental and dosage-dependent expression would make selection against partial fertility difficult. At the same time, the financial consequences of fertility breakdown are large, leading commercial breeders to be conservative in their crossing decisions when developing new female parent lines. This in turn has resulted in low diversity and high LD of female parent lines.
Conclusions
In contrast to fertility restoration, temperature-induced partial fertility in A1 cytoplasm of sorghum is under complex multigenic control. It involves nuclear genes from a range of different networks influencing a variety of biological processes, which are distinct from the major restorer genes that are almost exclusively PPR genes that act via mediation of gene expression in mitochondria. A possible explanation for these results is that in female parent lines, high temperatures partially impair the expression or function of sterility-inducing genes present in the mitochondria exposing variation in the downstream networks of genes that influence pollen production.
The presence of at least 43 regions of the sorghum genome interacting with the environment to influence partial fertility combined with the large negative commercial impacts of fertility in female parents appears to be the major cause of lower genetic variation in female vs male parents. The low variation and higher LD in female parents are likely to have constrained genetic gain and made introgression of new traits difficult. In the future, it should be feasible to use genomic prediction to introduce new variation while maintaining sterility.
Supplementary Information
Below is the link to the electronic supplementary material.Supplementary file 1 (DOCX 17 KB)
