Allelic variation at a single locus distinguishes spring and winter faba beans

Hailin Zhang; Alex Windhorst; Elesandro Bornhofen; Zuzana Tulpova; Petr Novak; Jiri Macas; Hana Simkova; Marcin Nadzieja; Jung Min Kim; Dustin Cram; Yongguo Cao; David J. F. Konkin; Olaf Sass; Gregor Welna; Axel Himmelbach; Martin Mascher; Wolfgang Link; Soon-Jae Kwon; Tae-Jin Yang; Stig Uggerhøj Andersen; Murukarthick Jayakodi

PMC · DOI:10.1038/s41588-026-02524-y·March 10, 2026

Allelic variation at a single locus distinguishes spring and winter faba beans

Hailin Zhang, Alex Windhorst, Elesandro Bornhofen, Zuzana Tulpova, Petr Novak, Jiri Macas, Hana Simkova, Marcin Nadzieja, Jung Min Kim, Dustin Cram, Yongguo Cao, David J. F. Konkin, Olaf Sass, Gregor Welna, Axel Himmelbach, Martin Mascher, Wolfgang Link, Soon-Jae Kwon

PDF

Open Access

TL;DR

Researchers identified a key genetic variant that distinguishes winter and spring faba beans, helping improve understanding of winter hardiness in this important protein crop.

Contribution

A major winter hardiness locus was identified, explaining most phenotypic variation and differentiating between winter and spring faba bean types.

Findings

01

An improved faba bean reference genome was developed.

02

A major locus for winter hardiness was identified through genome-wide association analysis.

03

Additional genetic signals for improving winter hardiness were found in the winter faba bean gene pool.

Abstract

Winter faba beans exhibit significant yield advantages over spring cultivars and hold promise for enhancing local protein production and agricultural sustainability. However, the threat of winter kill limits wider cultivation, and the genetics of faba bean winter hardiness remain unresolved. Here we develop a greatly improved faba bean reference genome and combine this with resequencing and phenotyping of winter and spring accessions to identify genetic determinants of winter hardiness. Genome-wide association analysis of frost tolerance traits identifies a major winter hardiness locus, the most strongly associated variant of which explains the vast majority of phenotypic variation and accurately differentiates between winter and spring types. Furthermore, we identify additional signals within the winter faba bean gene pool that could lead to further improvement of winter hardiness. Our…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Vicia faba(broad bean · species)

Figures14

Click any figure to enlarge with its caption.

Genomic feature distribution across *V. faba* chromosomes.a, Circos plot illustrating the distribution of various genomic features across the six chromosome-scale scaffolds of the *V. faba* genome. The tracks, arranged from outside to inside, represent the following: (1) density of transposable elements; (2) number of full-length LTR retrotransposons, identified by the DANTE_LTR pipeline; (3) position of centromeres, determined by ChIP–seq using the CENH3 antibody, shown as the ChIP/input ratio; (4) density of tandem repeats; (5) number of protein-coding genes; (6) density of 45S and 5S riboso

Population structure of spring and winter populations in faba bean.a, PCA of 406 faba bean accessions including spring group and winter group. Each point represents one accession, colored by growth habit. The percentages of variance explained by principal components 1 (6.27%) and 2 (2.54%) are indicated on the axes. b, Maximum likelihood phylogenetic tree of faba bean accessions inferred from whole-genome SNPs with 100 nonparametric bootstraps. c, Population structure of 406 faba bean accessions. Each vertical bar represents one accession, and colors indicate estimated membership proportions o

Candidate selection regions for winter survival and LF tolerance identified by GWAS and population-differentiation statistics.a, Genome-wide association statistics for LF survival (top) and winter survival (middle). The *x* axis shows genomic coordinates (Hedin/2). Bottom: genome-wide view in 10-kb/2-kb sliding windows of differentiation (*F*_ST_) within the spring faba bean and winter faba bean gene pools. b, Close-up overlap view of the significant locus and heatmap around an 8.48-Mb region identified on chromosome 1 with the previous QTL mapping. The heatmap shows SNPs (blue, alternative al

GWAS for WF traits in winter faba bean population.a, Manhattan plots showing loss of turgor (top), loss of color (middle), and the combination of loss of turgor and color traits (bottom) using the BLINK model and GEMMA MLM models. b, LD (*r*^2^) plots with the significant locus identified on chromosome 3 and chromosome 5. c, Heatmap expression of 17 genes in all samples. d–f, Box plot of the significant locus on chromosome 5 based on loss of turgor (d), loss of color (e), and combination of loss of turgor and color traits (f); box-and-whisker plots show the median and interquartile range, and

Funding5

—https://doi.org/10.13039/501100001664Leibniz-Gemeinschaft (Leibniz Association)
—https://doi.org/10.13039/501100008530EC | European Regional Development Fund (Europski Fond za Regionalni Razvoj)
—https://doi.org/10.13039/501100010963Academy of Sciences of the Czech Republic | Parazitologický ústav, Akademie Věd České Republiky (Institute of Parasitology AS CR)
—https://doi.org/10.13039/501100003725National Research Foundation of Korea (NRF)
—https://doi.org/10.13039/100010661EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

Keywords

Genome assembly algorithmsGenome-wide association studiesGene expression

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic and Environmental Crop Studies · Plant pathogens and resistance mechanisms · Agricultural pest management studies

Full text

Main

Faba bean (Vicia faba, 2n = 2x = 12) is renowned for its high protein content^1^ (~29%), superior nitrogen-fixing ability^2^ and sustainability credentials^3^. Faba bean production has grown steadily during the past 10 years in Europe and other parts of the world^4^. Faba beans are cultivated as both spring and winter types. Notably, winter faba beans have significant yield advantages (up to 47%) over spring cultivars^5^ and are mostly grown in northwest Europe and the UK. Against a backdrop of increasing temperature and drought, winter faba bean is recognized as a promising crop to simultaneously boost local protein production and agricultural sustainability via crop rotation and as a cover crop. However, winter faba bean cultivation remains limited owing to insufficient winter hardiness, including winter survival and tolerance against frost during winter and early spring (so-called late frost; LF), of the available winter faba bean cultivars. Despite attempts to identify quantitative trait loci (QTL) controlling winter hardiness^6–8^, the genetic basis for winter hardiness remains unclear.

Faba bean seeds contain vicine and convicine (VC) content, pyrimidine glucosides that cause hemolytic anemia (also known as favism) in individuals deficient in glucose-6-phosphate dehydrogenase. This represents a major restriction on faba bean production and consumption^9^ that is targeted primarily in faba bean breeding. Substantial breeding and research efforts have cloned the major VC1 gene^10^, which controls the rate of VC accumulation in seed and several low VC faba bean spring-type cultivars. Nevertheless, low VC winter cultivars of faba bean are yet to be developed^11^. The release of the first faba bean reference genome has facilitated genetic studies^12^ and genomic characterization of faba bean germplasm^13^. However, the genomic loci that have been under selection due to breeding remain largely unknown. Many of the agronomic traits are still selected phenotypically by breeders. Therefore, charting genomic regions shaped by breeding efforts, termed ‘breeding signatures’, could lead to discovery of genes underlying important agronomic and nutritional quality traits and accelerate the selection process using trait-specific molecular markers. Compared to the spring gene pool, winter types have received less attention from breeders, partially owing to a lack of diversity in the winter gene pool. Therefore, identification of genomic regions selected for in the elite spring gene pool during intensive breeding may also provide targets for winter faba bean improvement, especially with respect to quality traits.

To improve faba bean winter hardiness and breeding in general, we present an improved genome assembly of reference inbred line Hedin/2 (derived from cultivar Hedin) with improved gene annotation and addition of open chromatin annotation. In addition, resequencing of diverse elite breeding lines and winter genotypes identifies genes related to winter hardiness and genomic regions selected during breeding. The findings of our study and the greatly improved new genome sequence will enhance genomic insights into faba bean cold acclimation and facilitate genetic improvement.

Results

An improved faba bean reference sequence and gene annotation

We generated a Bionano optical map for the reference faba bean Hedin/2 and obtained a contig N50 of 41 Mb (Supplementary Table 1). Hybrid scaffolding combining contigs from the first version of the Hedin/2 v.1 reference genome, generated from PacBio HiFi reads, and the Bionano map resulted in 317 scaffolds with a greatly improved N50 value of 100 Mb (Table 1). We then used chromosome conformation capture sequencing data to order and orient the hybrid scaffolds into six large chromosomal pseudomolecules with a total size of 11.7 Gb (hereafter Hedin/2 v.2) (Extended Data Fig. 1). More than 97% of the scaffolds were assigned to precise chromosomal locations (Table 1 and Supplementary Table 2), compared to 94% in Hedin/2 v.1. There were fewer sequence gaps in the second version (335) than in the first version (5,195) (Supplementary Table 2). The size of unanchored sequences in Hedin/2 v.2 was 295 Mb, compared to 648 Mb in the first version. Further assembly evaluation using Merqury^14^ revealed comparable and yet improved accuracy relating to structural and base-level accuracy, as well as greater gene space completeness in the second version (Table 1). Although overall chromosomal collinearity between versions was high, many inverted sequence orientations were corrected in Hedin/2 v.2 (Extended Data Fig. 2). In addition, the search for telomeric sequence motif TTTAGGG in Hedin/2 v.2 revealed the presence of telomeres at most chromosome ends (Extended Data Fig. 3).Table 1. Summary statistics of faba bean genome assembly versionsQuality categoryMetricv.1v.2ContinuityContig N502.7 Mb2.7 MbHybrid scaffold N50–100 MbNumber of gaps5,195335LTR assembly index10.512.9Chromosome statusChromosome anchoring rate93.9%97.5%Chromosome anchored size11.2 Gb11.7 GbUnanchored size648 Mb295 MbStructural accuracyFalse duplications0.08%0.08%Curation (Hi-C)ManualManualBase accuracyConsensus quality value60.559.7k-mer completeness (%)96.33%96.36%Functional completenessBUSCO (n = 5,366)97.6%97.7%Hi-C, chromosome conformation capture sequencing.

Improved de novo gene annotations were generated by integrating newly generated PacBio Iso-Seq datasets from 11 tissues (Supplementary Table 3), previous RNA sequencing (RNA-seq) datasets^12^ and related protein evidence (Supplementary Table 4), resulting in a total of 35,107 protein-coding genes, including 963 genes not identified in the Hedin/2 v.1 sequence. In total, 55,283 transcripts were identified from the coding genes; this was higher than the total of 37,065 transcripts for Hedin/2 v.1, indicating notable improvement in annotation of alternative splicing (Supplementary Table 5). Further, manual curation resolved the fragmented and merged gene models using full-length lso-Seq datasets (Extended Data Fig. 4). The long intron genes were annotated well in Hedin/2 v.2 (maximum intron size of 245 kb), compared to v.1 with a maximum intron size of 145 kb supported by full-length Iso-Seq (Supplementary Fig. 1). Seed storage protein genes are crucial in grain legumes and are often tandemly clustered, making them difficult to annotate with short RNA-seq reads. Comparative analysis of seed storage protein genes between the two annotation versions (Supplementary Table 6) revealed that a cluster of five genes belonging to the legumin B4 subgroup was newly annotated and anchored in Hedin/2 v.2 chromosomes (Extended Data Fig. 5a), indicating advantages of including optical mapping and Iso-Seq data in assembly and annotation, respectively. Orthologous clustering with OrthoVenn3 across legume species identified 20,295 clusters, including 11,322 shared among all and 1,051 unique to faba bean (Supplementary Fig. 2 and Supplementary Tables 7 and 8). Overall, our improved reference genome sequence and gene annotation offer an accurate physical guide for orientation of contigs in future faba bean genome assemblies, especially for pangenome studies, and may facilitate precise characterization of agronomically and economically important genes.

Repeat composition and centromere landscape

Using a combination of structure- and similarity-based approaches, 93% of the genome was annotated as repetitive sequences (Fig. 1a and Supplementary Table 9), which is 5% higher than the proportion annotated in the previous assembly. Class I mobile elements were by far the dominant group of repeats, comprising 83% of the genome. This was mainly due to amplification of the Ogre lineage of Ty/3 gypsy long terminal repeat (LTR) retroelements, which alone constituted about 71% of the faba bean genome. Accumulation of Ogre elements has previously been identified as a major driver of genome size diversification in the tribe Fabeae, with V. faba having the largest genome^15^. Analysis of the relative age distribution of LTR retrotransposons in the current assembly revealed that the Ogre elements had undergone at least two recent amplification events and had a relative absence of older elements, suggesting rapid turnover. This was in contrast to some other LTR retrotransposon lineages, such as Tork and Retand, which did not appear to show recent activity (Supplementary Fig. 3). However, most lineages had similar age profiles to Ogre, which raised the question of why Ogre amplifies at such high copy numbers in V. faba. This is unlikely to be related to the genomic niche it occupies, as most of the abundant lineages were relatively evenly distributed across the genome, with the exception of regions dominated by satellite DNA (satDNA) (Supplementary Fig. 4). Whereas the most abundant Ty1/copia lineage was SIRE, with 5.9% of the genome, the other major repeat type was satDNA, making up 8.5% of the genome (Supplementary Table 9). In contrast to the satDNA, which formed large arrays constituting heterochromatic bands on metaphase chromosomes, mobile elements were found evenly dispersed along the whole chromosomes, except for the regions occupied by satDNA, consistent with v.1 (Fig. 1a). In comparison to v.1, there was a higher proportion of Ogres (71% versus 44.4% of the genome) in Hedin/2 v.2. Although the global outlook regarding the repeat landscape was similar between Hedin/2 v.2 and the previous version, we attribute some discrepancies in the proportion of repeat compositions between the versions to improved assembly quality and/or the use of different annotation tools.Fig. 1. Genomic feature distribution across V. faba chromosomes.a, Circos plot illustrating the distribution of various genomic features across the six chromosome-scale scaffolds of the V. faba genome. The tracks, arranged from outside to inside, represent the following: (1) density of transposable elements; (2) number of full-length LTR retrotransposons, identified by the DANTE_LTR pipeline; (3) position of centromeres, determined by ChIP–seq using the CENH3 antibody, shown as the ChIP/input ratio; (4) density of tandem repeats; (5) number of protein-coding genes; (6) density of 45S and 5S ribosomal DNA genes. All densities and feature counts were averaged over 5-Mbp windows, except for ChIP/input ratios, which were averaged over 2-Mbp windows. b, Metagene plot and heatmap visualizations of accessibility around genes in adult leaf (left) and seedling (right) tissues of faba bean. The x axis represents regions spanning 2 kb upstream of the transcription start site, the gene body, and 2 kb downstream of the transcription end site. Line plots show average accessibility across all genes, and heatmaps display accessibility of individual genes, with color intensity indicating signal strength. rDNA, ribosomal DNA; RT, retrotransposon; TE, transposable elements; TES, transcription end site; TSS, transcription start site.

We mapped the positions of functional centromeres by chromatin immunoprecipitation followed by sequencing (ChIP–seq) of CENH3, a centromere-specific variant of histone H3 (Fig. 1a), and determined the sizes of centromeres, defined as the lengths of CENH3-enriched regions. Sizes ranged from 6.18 Mb (centromere 1) to 20.6 Mb (centromere 6) (Supplementary Table 10), making these among the largest monocentric centromeres described for plants to date. In most centromeres, the coverage with CENH3 was contiguous, with relatively even CENH3 ChIP–seq enrichment profiles (for example, centromere 1). In centromere 5, the CENH3-associated domain was disrupted by a region of about 230 kb that contained satellite FabTR-53 and lacked CENH3. However, the main exception was centromere 6 (the largest centromere), for which the enrichment profile consisted of three distinct peaks separated by regions of lower CENH3 enrichment (Supplementary Figs. 5–10). This organization and the association of CENH3 peaks with satDNA arrays resembled the structure of meta-polycentric chromosomes occurring in related genera Pisum and Lathyrus^16,17^. Centromeres were enriched in satDNA (23% versus 8.5% in the whole genome). However, the distribution of centromeric satellites was not even among the chromosomes. Whereas centromere 1 was almost entirely composed of a mosaic of four satellite repeats, satDNA comprised only a minor part of the remaining centromeres, with centromere 3 lacking satDNA almost entirely. Arrays of 14 different satellite repeats were found to be associated with centromeric chromatin, most of them specific to a single chromosome (Supplementary Table 11).

Genome-wide open chromatin regions

The majority of chromatin in large eukaryotic genomes is compacted and inaccessible to transcriptional machinery. Identification of the regions that are accessible can therefore provide a general proxy for functional regions of the genome and a powerful filter for interpreting variation in noncoding regions of the genome. Using assay for transposase-accessible chromatin using sequencing (ATAC-seq), we identified 121,443 and 93,501 accessible regions representing 39.6 Mb and 32.4 Mb in the adult leaf and seedling tissues, respectively, and only ~0.3% of the assembled genome (Supplementary Table 12). Of these accessible regions, 56,374 were shared between the two tissue types. Accessible chromatin was made up of a much higher fraction of genes and gene-proximal sequences (~5%, 0–3 kb away) than distal (<0.4%, 3–30 kb away) and intergenic (<0.2%, >30 kb from gene) regions, although the number of accessible regions in intergenic regions was the highest of the four categories overall. Accessible regions that overlapped with genes tended to be larger in other areas of the genome (Supplementary Table 12). The distribution of the distances between accessible regions and genes showed a trimodal distribution (Extended Data Fig. 5b). Whereas the largest peak corresponded to the genome-wide background, it was unclear whether the secondary peaks represented accessible sites near unannotated genes or a discrete class of accessible regions that tend to occur further away from genes owing to biological phenomena such as transposable element insertion or regulatory mechanisms. As noted previously in other species^18^, chromatin accessibility is elevated near annotated transcriptional start and end sites (Fig. 1b). However, we also observed a decrease in accessibility near the end of the annotated genes, which generally corresponds to the end of the coding sequence owing to a lack of untranslated region annotation in the current version. Given that stop codon readthrough may be widespread in plants^19^, future research is warranted to determine whether this apparent footprint is due to factors that encode stop codon preference.

Genetic diversity and population structure of spring and winter faba bean

Using >10-fold resequencing of 406 accessions including 209 elite spring faba bean breeding lines and 197 winter inbred lines (Supplementary Table 13), we discovered 98,797,214 single-nucleotide polymorphisms (SNPs) and 1,751,392 insertions and deletions (indels) (Supplementary Table 14). These variants were uniformly distributed along the chromosomes except over satellite regions (Extended Data Fig. 6). Principal component analysis (PCA) and phylogenetic analysis clearly showed distinct clusters of winter and spring types (Fig. 2a,b). This relationship was strongly supported by ADMIXTURE^20^ analysis, which provided two distinct population clusters at K = 2, for spring and winter (Fig. 2c). Calculation of genetic diversity using low-admixed individuals showed higher nucleotide diversity for winter (π = 10.01 × 10^−3^) than elite spring (π = 8.79 × 10^−3^) genotypes. In addition, population genetic differentiation (FST) between spring and winter populations was relatively low (FST = 0.064), indicating that only a small number of loci might be responsible for the winter growth habit. Analysis of linkage disequilibrium (LD) decay indicated an LD decay distance of 682 kb for elite spring and 462 kb for winter genotypes (Fig. 2d). The relatively low diversity and slower rate of LD decay in elite spring lines suggest a reduction in genetic diversity due to selective breeding.Fig. 2. Population structure of spring and winter populations in faba bean.a, PCA of 406 faba bean accessions including spring group and winter group. Each point represents one accession, colored by growth habit. The percentages of variance explained by principal components 1 (6.27%) and 2 (2.54%) are indicated on the axes. b, Maximum likelihood phylogenetic tree of faba bean accessions inferred from whole-genome SNPs with 100 nonparametric bootstraps. c, Population structure of 406 faba bean accessions. Each vertical bar represents one accession, and colors indicate estimated membership proportions of each genetic cluster. d, Genome-wide LD decay across the two faba bean populations. Genome-wide LD decay is shown as the relationship between pairwise LD (r^2^) and physical distance (Mb) for the spring and winter populations. Solid lines represent the fitted LD decay curves. Dashed horizontal lines indicate the r^2^ value corresponding to half of the initial LD, and the associated physical distances reflect the extent of LD decay in each population.

Selective sweeps for improvement

To identify the putative genomic regions selected for during spring faba bean breeding, we identified selective sweeps between the elite spring and winter population using the cross-population composite likelihood ratio test (XP-CLR)^21^. A total of 47 distinct sweep regions were identified, representing 11.03 Mb (0.9%) of the Hedin/2 v.2 genome sequence. In total, 228 genes overlapped with these candidate regions (Supplementary Table 15). Gene ontology and pathway enrichment analysis of these genes indicated that they were involved in biological processes particularly related to nodulation and cold acclimation and pathways related to oxidative phosphorylation and lipid metabolism (Supplementary Table 16), which are known to participate in response to cold stress^22^. Notably, a prominent sweep was seen on chromosome 1, likely representing the VC1 locus^10^ (Extended Data Fig. 7a). To verify that region, we grouped the elite spring lines into low VC (n = 90) and normal VC (n = 119) genotypes based on passport information and performed an additional pairwise comparison using XP-CLR analysis. The results confirmed that the sweep on chromosome 1 was associated with the VC1 locus^10^ (Extended Data Fig. 7b), indicating that breeding for low VC content may have swept this locus in spring breeding lines. In addition, this sweep locus concurred with a previously fine mapped VC region^11^ (Extended Data Fig. 7c). Within this haplotype block, we identified four copies of the VC1 gene in Hedin/2 v.2, whereas there were five copies in the previously published ‘Tiffany’ genome^12^ (Extended Data Fig. 7d and Supplementary Fig. 11). Haplotype analysis with SNPs showed more missing data for certain haplotypes (Extended Data Fig. 8a), suggesting the occurrence of VC1 copy number variation in faba bean. A set of 31 indel markers significantly differentiated the low VC haplotypes from the wild types (Extended Data Fig. 8b), providing a new set of markers for selection in breeding (Supplementary Table 17).

Genome-wide association scan for winter-hardiness

We investigated the genetic basis of winter hardiness in 208 accessions comprising 180 spring types and 28 winter lines, also referred to as the ProFaba panel^13^, using genome-wide association studies (GWAS) of the winter survival rate and LF survival traits (Supplementary Tables 18 and 19). We identified a total of 1,131,399 SNPs using a combination of genotyping-by-sequencing^23^ and single primer enrichment technology^12^. Two loci on chromosomes 1 and 5 were significantly associated with both traits (Fig. 3a). Further, these two loci coincided with the peak regions of high genetic differentiation (FST) between the 209 elite spring and 197 winter inbred lines (Figs. 2a and 3a). The chromosome 1 GWAS peak also overlapped with the previously identified major QTL for frost tolerance^8^ (Fig. 3b), indicating a crucial and consistent locus related to winter hardiness. We refer to this locus as FROST RESISTANCE 1 (FR-1). A significantly associated T/C SNP, located downstream of the Vfaba.Hedin2.R2.1g002127 gene, enabled classification of accessions into two major haplotypes. Accessions harboring the CC haplotype demonstrated significantly higher winter hardiness scores compared to those with the TT haplotype (Fig. 3c,d). Deeper haplotype analysis of the chromosome 1 locus showed a clear separation of winter and spring genotypes (Fig. 3e). We further identified a set of 187 SNPs that could separate the spring and winter types (Extended Data Figs. 9 and 10 and Supplementary Table 20) and thus could be used as diagnostic markers for winter faba bean breeding. This sweep region contained a total of 30 genes, including a cluster of 14 C-repeat binding factor (CBF)/dehydration-responsive element binding factor 1 (DREB1) genes, whose role in cold acclimation has been demonstrated in diverse plant species^24–26^. To identify potential candidate genes, we conducted cold treatment transcriptome assays sampling at 18 °C, low temperatures of 4 °C and 0 °C, and freezing temperatures of −7 °C using a winter-hardy genotype (Wonjam1-ho from Korea) and a cold-susceptible spring genotype (PI 271634 from India) (Supplementary Tables 21 and 22). Expression data showed that the three CBF/DREB1 genes (Vfaba.Hedin2.R2.1g002127, Vfaba.Hedin2.R2.1g002152 and Vfaba.Hedin2.R2.1g002153) at the FR-1 locus were induced when the temperature was low and nonfreezing (4 °C), a phenomenon known as cold acclimation^25^, in the winter genotype compared to the spring accession (Fig. 3f,g). Expression patterns of 14 CBF/DREB1 genes were further validated by quantitative PCR with reverse transcription (RT–qPCR) analysis in five winter-hardy genotypes and five cold-susceptible spring genotypes (Supplementary Fig. 12). This suggests that these CBF/DREB1 genes may trigger cold responsive genes, also known as the CBF regulon^27^, that may increase tolerance to freezing in winter-hardy faba bean. Further, this locus has been reported to be syntenic with the freezing tolerance QTL identified on chromosome 6 of Medicago truncatula^28^ and P**isum sativum^28^, respectively (Supplementary Fig. 13). Similarly, among 7 genes in the chromosome 5 locus, a chalcone synthase gene (Vfaba.Hedin2.R2.5g030100) showed increased expression when the temperature was below 4 °C. Chalcone synthase is a key enzyme in the flavonoid pathway, and its role in cold acclimation has been reported in Arabidopsis^29^ and other crops^30–32^. These results indicate that the winter hardiness in faba bean is a complex process involving the CBF regulon and cold-induced flavonoids, such as anthocyanins, as additional factors.Fig. 3. Candidate selection regions for winter survival and LF tolerance identified by GWAS and population-differentiation statistics.a, Genome-wide association statistics for LF survival (top) and winter survival (middle). The x axis shows genomic coordinates (Hedin/2). Bottom: genome-wide view in 10-kb/2-kb sliding windows of differentiation (FST) within the spring faba bean and winter faba bean gene pools. b, Close-up overlap view of the significant locus and heatmap around an 8.48-Mb region identified on chromosome 1 with the previous QTL mapping. The heatmap shows SNPs (blue, alternative allele; orange, reference allele; green, heterozygosity; red, missing) in this region. c,d, Box plot of the significant locus on chromosome 1 based on LF survival scores (c) and winter survival (d). Boxes indicate medians and interquartile ranges, with whiskers extending to 1.5× interquartile range; points represent individual samples (TT, n = 180; CC, n = 28). Statistical significance was assessed using two-sided Welch’s t-tests: for LF survival, t = −17.01, d.f. = 51.8, P = 9.8 × 10^−14^, 95% CI [−70.97, −55.99], Cohen’s d = −2.35; for winter survival, t = −16.43, d.f. = 103.4, P = 6.2 × 10^−13^, 95% CI [−57.72, −45.28], Cohen’s d = −1.75. e, Differentiation phylogenetic tree for spring and winter using the selected 187 SNP markers. f, Heatmap of expression of 37 genes in all samples. g, Gene expression confirmation for three candidate genes (Vfaba.Hedin2.R2.1g002127, Vfaba.Hedin2.R2.1g002152 and Vfaba.Hedin2.R2.1g002153) by qRT–PCR across control and cold-treated conditions using five susceptible accessions (PI 271634, PI 284338, PI 284341, PI 371797 and PI 567886) and five tolerant accessions (Wonjam1-ho, PI 567891, PI 577721, PI 577722 and PI 577743). Data are shown as mean ± s.e.m. (n = 3). TPM, transcripts per million.

In addition, we analyzed the variation in frost tolerance for our 183 winter accessions. To identify associated markers, we performed GWAS for visually assessed traits such as loss of leaf (and stem) color and turgor (Fig. 4a) obtained from a recent study^33^. Two independent GWAS models consistently pinpointed two loci: one on chromosome 5 (phenotypic variation explained: 19.83%) and the other on chromosome 3 (phenotypic variation explained: 7.92%). Accessions with the GG haplotype exhibited significantly higher symptom scores compared to carriers of the CC haplotype on chromosome 5 (Fig. 4d–f). Based on local LD blocks, 13 genes were identified within a 5.57-Mb interval on chromosome 5, and 4 genes in a 125.29-kb region on chromosome 3 (Fig. 4b). Expression analysis of these genes in winter cultivar Wonjam1-ho (Fig. 4c and Supplementary Table 22) showed a strong expression pattern for the Vfaba.Hedin2.R2.5g032310 gene on chromosome 5, which encodes the flowering locus K homology domain that has been reported to repress flowering locus C in Arabidopsis under cold temperature conditions^34^. We observed significant downregulation of the gene Vfaba.Hedin2.R2.3g019128 (U11/U12 small nuclear ribonucleoprotein 48-kDa protein) on chromosome 3. However, its functional relevance to environmental stresses is yet to be determined. To further validate the two loci identified, we conducted a prediction study using two genomic best linear unbiased prediction (gBLUP) models. We aimed to assess the additional predictive power gained by incorporating the main effects of the two loci as fixed regressions, along with a random polygenic effect (model M2), compared to a genomic model that only included a random polygenic effect (model M1) (Fig. 4g). Using independent datasets for cross-validation, we found that model M2 consistently outperformed model M1 by around 30% across all traits. This indicates that the loci identified through GWAS analyses are tagging important genomic regions related to winter hardiness^33^, which have not been discovered before. Moreover, the magnitude of predictive abilities suggests that further improvements in freezing tolerance in winter-type faba bean could be achieved through molecular breeding targeting the loci identified here.Fig. 4GWAS for WF traits in winter faba bean population.a, Manhattan plots showing loss of turgor (top), loss of color (middle), and the combination of loss of turgor and color traits (bottom) using the BLINK model and GEMMA MLM models. b, LD (r^2^) plots with the significant locus identified on chromosome 3 and chromosome 5. c, Heatmap expression of 17 genes in all samples. d–f, Box plot of the significant locus on chromosome 5 based on loss of turgor (d), loss of color (e), and combination of loss of turgor and color traits (f); box-and-whisker plots show the median and interquartile range, and whiskers extend to 1.5× the interquartile range, with individual data points overlaid. Each point represents one biologically independent sample (n = 110 for the CC group and n = 70 for the GG group). Statistical significance was assessed using two-sided Welch t-tests: for loss of turgor, t = −6.22, d.f. = 117.9, P = 1.6 × 10^−8^, 95% CI [−3.54, −1.83], Cohen’s d = −1.01; for loss of color, t = −7.10, d.f. = 120.6, P = 1.1 × 10^−10^, 95% CI [−3.75, −2.12], Cohen’s d = −1.15; and for loss of turgor and color, t = −6.85, d.f. = 117.5, P = 7.4 × 10^−10^, 95% CI [−7.26, −4.01], Cohen’s d = −1.12. g, Predictive ability of a linear model fitting random genome-wide markers (M1) compared with an alternative model (M2) in which significant markers reported in a were explicitly modeled as fixed effects. The bar graph shows mean ± s.d. calculated from 1,000 rounds (n = 1,000 independent tests) of a hold-out cross-validation scheme (60:40 training/test split). TC, turgor and color.

Discussion

Faba bean is a high-yielding protein crop that is suitable for improving soil fertility and increasing the production of plant protein, which could reduce the use of meat proteins. A significant number of assembly gaps collapsed tandemly duplicated regions and repetitive elements^12^, requiring improvements to contiguity and correctness for the faba bean reference sequence. In this study, we generated an optical map as a complementary dataset to overcome existing limitations. We increased the sequence contiguity to 33-fold and, in turn, the number of sequences from thousands to a few hundred for pseudomolecule construction compared to Hedin/2 v.1^12^. The new Hedin/2 v.2 reference sequence significantly improved the representation of complex repeats, protein-coding regulatory genes and the centromere landscape, resulting in an upgraded foundational genomic tool for fine-scale genetic studies and precision breeding.

Rising temperatures pose a severe threat to spring-sown faba bean, the growth and yield of which can be adversely affected by temperatures above 30 °C (ref. ^35^). Whereas efforts are being made to identify the genetic basis of heat tolerance^36^, faba bean winter types offer distinct advantages over spring types, including yield superiority, efficiency in using soil moisture, and escape from drought and some pest attacks^5^. Although strict prolonged cold exposure (known as vernalization) may not be required for winter types of faba bean, insufficient winter hardiness creates a risk of winter kill. Our comprehensive analysis using multiple datasets detected potential genomic loci for improving winter hardiness in faba bean. Notably, the chromosome 1 locus coincided with frost tolerance QTL identified by independent previous studies^8,37,38^. Syntenic conservation of this locus among important grain legumes indicates that translational approaches could be used to bestow winter hardiness on related grain legumes such as pea via genetic manipulations. Our new genomic resources and findings will enable further studies of the genetic basis of cold acclimation in related temperate grain legumes and active breeding of nutritious winter faba bean.

Methods

Optical genome map and hybrid scaffolding

A total of 1.5 million G1 nuclei from young leaves of faba bean inbred line Hedin/2 (highly homozygous) grown in a controlled environment were purified using flow cytometry. Sorted nuclei were embedded in agarose miniplugs and treated with proteinase K as described in ref. ^39^. Then, 490 ng of high-molecular-weight DNA released from the miniplugs was directly labeled at DLE-1 recognition sites (CTTAAG motif) and stained following a standard Bionano Prep Direct Label and Stain protocol (Bionano Genomics). Labeled and stained high-molecular-weight DNA was analyzed on the Bionano Genomics Saphyr platform using a single Saphyr G1.2 chip. In total, 1,842.6 Gb of single molecule data >150 kb, corresponding to 142× coverage of the V. faba genome, was collected and used to generate a de novo optical genome map (Supplementary Table 1) with Bionano Solve (v.3.6.1_11162020) using ‘optArguments_nonhaplotype_noES_noCut_DEL1_saphyr.xml’. The contig sequence from version 1 and the optical genome map assembly were used as input files for an automated hybrid scaffolding (HS) pipeline integrated with Bionano Solve (v.3.6.1_11162020). The hybrid scaffolding pipeline was run with default parameters and the ‘Resolve conflict’ option for conflict resolution.

Pseudomolecule construction and validation

The scaffolds were then used to construct pseudomolecules with the TRITEX pipeline^40^. Finally, the order and orientation of scaffolds in each chromosome were manually inspected and corrected with evidence from Omni-C published based on our previous assembly^12^. We also used Merqury v.1.3^14^ and TR_retriever v.2.9.0^41,42^ to evaluate genome completeness and consensus accuracy. Gene space completeness was evaluated using BUSCO v.5.8.2^43^ analysis with Fabales database 12.

Repeat annotation

Tandem repeats and satellites were annotated using TideCluster v.1.0^44,45^ (https://github.com/kavonrtep/TideCluster), a wrapper for TideHunter^44^. Satellite repeats with a monomer size ranging from 40 kb to 3 kb and a minimum array length of 5 kb were annotated using default TideCluster settings. Satellites with a monomer size of 10–39 bp and a minimum array length of 5 kb were identified using TideCluster with parameters -T ‘-p 10 -P 39 -c 5 -e 0.25’ -m 5000. LTR retrotransposons were annotated using DANTE v.0.1.8^46^ (https://github.com/kavonrtep/dante) and the DANTE_LTR v0.2.3.2^46^ pipeline (https://github.com/kavonrtep/dante_ltr) on the RepeatExplorer Galaxy server^47^. The sequences of the identified LTR retrotransposon elements were used to create a custom library of LTR retrotransposon elements using the ‘dante_ltr_to_library’ script from the DANTE_LTR repository. A custom library of class II transposable elements was obtained using RepeatExplorer clustering protocol 1^48^ on unassembled Illumina paired-end reads. Contigs corresponding to class II retrotransposons with a minimum read depth of 5 reads and a minimum length of 100 bp were obtained using tools on the RepeatExplorer Galaxy server. Consensus sequences of ribosomal RNA gene arrays including intergenic spacer sequences were fully reconstructed from the RepeatExplorer contigs. A custom library of LINE elements was created by extracting regions with LINE protein-coding domains identified by DANTE, together with upstream and downstream 4 kb flanking regions. The extracted genomic sequences were split into 100-nt fragments and analyzed by RepeatExplorer clustering. Contigs corresponding to LINE elements with a read depth of at least 3 reads and a minimum length of 150 nt were converted into a custom library. All custom libraries were concatenated to form a library for RepeatMasker v.4.1.5^49^ search. The RepeatMasker search was performed on the RepeatExplorer Galaxy server with options ‘-xsmall -no_is -e ncbi’. All regions annotated as mobile elements with RepeatMasker based on a custom library search that overlapped with satellite repeats annotated by TideCluster were removed from the annotation using bedtools^50^ with the ‘bedtools subtract’ command. The resulting GFF3 was then merged with the DANTE annotation using a custom R script (https://github.com/kavonrtep/granges_tools/blob/main/merge_repeat_annotations.R). Classification of mobile elements in the annotation files corresponded to the classification system used in REXdb^51^. For the final repeat-masking process, all the above repeat annotation GFF3 files were consolidated. We merged the annotated regions into a single BED file using the bedtools v.2.31.1 merge tool^50^.

Analysis of centromeres

Centromere positions in the assembly were identified using ChIP–seq with CENH3 antibody CenH3-2_VF (raised against peptide CQT PRH ARE TQE KKK RRN KPG^52^) as described previously^12^. Duplicate experiments, including independent chromatin preparations, were performed, and the resulting immunoprecipitated fragments and the corresponding control samples (input; digested chromatin not subjected to immunoprecipitation) were sequenced on an Illumina platform (Admera Health) in paired-end, 150-bp mode. The resulting reads were quality-filtered and trimmed using Trimmomatic v.0.39^53^ (minimum allowed length: 100 nt); this yielded 151–201 million reads per sample, which were mapped onto the assembly using Bowtie2 v.2.4.2^54^, with options -p 64 -U. Subsequent analysis was performed on the full output from Bowtie2 and on output from which all multimapped reads had been filtered out. Filtering of multimapped reads was performed using Sambamba v.0.8.1^55^ with options ‘-F [XS] == null and not unmapped and not duplicate’. Regions with statistically significant ChIP/input enrichment ratio were identified by comparing ChIP and input mapped reads using epic2^56^, with the parameter ‘–bin-size 200’. Alternative identification of enrichment was performed using MACS v3.0.0a6^57^, with options ‘–broad–broad-cutoff 0.1’. The ChIP/input ratio was calculated for plotting purposes using bamCompare v.3.5.1 from the deepTools package^58^. The program was run with the parameter ‘–binSize 200’ to calculate the log_2_ ratio for the 200-nt window size. The resulting data were plotted using the rtracklayer R package^59^.

Iso-Seq transcriptome assembly

Tissue samples, including stem, flower, closed petals, young leaf, mature leaf, mid-maturation pod, expansion-stage pod, mid-maturation embryo, expansion-stage whole seed, control (no rhizobia) roots and roots with root nodules 2 weeks and 30 days after rhizobia inoculation were collected from plants of the Hedin/2 genotype grown in a glasshouse with a 24-hour cycle of nominally 18 °C, 16 h light, 400 µmol m^−2^ s^−1^ rhizobia and 14 °C, 8 h dark, 0 µmol m^−2^ s^−1^. Raw PabBio Iso-Seq data were processed using default parameters with the IsoSeq3 (https://github.com/PacificBiosciences/pbbioconda) analysis tool in SMRT Link. The processing steps encompassed stages including circular consensus sequence generation, trimming, refining, clustering and polishing. Initially, reads were categorized as either full length or nonfull length based on the presence of the 5′ primer and 3′ primer or 3′ poly-A tail. Full-length nonchimeric (FLNC) reads were identified as those containing both 5′ and 3′ complementary DNA (cDNA) primers as well as a poly-A tail. Reads lacking any of these defining tags were classified as nonfull-length reads. Further processing involved trimming FLNC reads to eliminate barcoded and unbarcoded cDNA primers, as well as unwanted primers, and aligning the reads in the 5′ to 3′ orientation. Before generation of clustered consensus sequences, FLNC reads were subjected to refinement steps to remove poly-A tails and artificial concatemers. Subsequently, a polishing arrow model, which included Quiver tracks, was used to refine multiple FLNC reads derived from identical isoforms, with the aim of obtaining nonredundant isoform sequences. The final FLNC transcript sequences were then classified as high-quality transcripts with ≥99% postcorrection accuracy. Clustered transcripts were aligned to the reference genome Hedin/2 v.2 using minimap2 v.2.24^60^ (Supplementary Table 3). The transcripts were combined and refined using cDNA_Cupcake (https://github.com/Magdoll/cDNA_Cupcake) after initial filtration, which excluded transcripts unsupported by a minimum of two full-length reads per sample. The combined transcripts were used to annotate the genome.

Gene annotation

Gene annotation was conducted using BRAKER v.2.1.6^61^ with the etpmode and long reads protocol. RNA-seq data from 26 samples were downloaded^12^ and aligned to the Hedin/2 v.2 sequence using HISAT2 v.2.2.1 with standard parameters (–phred64–sensitive–no-discordant–no-mixed -I 1 -X 1000)^62^. Iso-Seq full-length transcripts as described before were used as input for the long reads protocol. The BRAKER long reads annotation pipeline was performed in four steps: ab initio prediction, homology-based prediction, RNA-seq data-based prediction and Iso-Seq data-based prediction. The final gene models were evaluated using both RNA-seq and Iso-seq transcriptome data, followed by manual correction, including removal of TE domain genes. Gene function annotation was performed by a BLAST search using predicted protein-coding genes against the following databases: eggNOG 5.0^63^, Gene Ontology^64^, KEGG^65^, NR (https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/), InterPro^66^, TrEMBL^67^ and SwissProt^68^. Gene ontology and KEGG pathway enrichment analyses for genes were using R packages Clusterprofiler^69^ and GOplot, with adjusted P < 0.1 and q < 0.3 (Benjamini–Hochberg method) as the cutoff criteria. We ran OMArk v.0.3.1 (https://omark.omabrowser.org/home/; release 2024.06)^70^ and BUSCO v.5.8.2^43^ to evaluate gene completeness. Protein sequences of faba bean were compared with those of seven other legume species (Lupinus angustifolius, M. truncatula, P. sativum, Glycine max, Phaseolus vulgaris, Vigna radiata and Vigna angularis) using OrthoVenn3^71^ (https://orthovenn3.bioinfotoolkits.net).

ATAC-seq library preparation and analysis

ATAC-seq libraries were prepared in duplicate. Plants were grown in a controlled environment in Pro-mix BX medium (Premier Tech), using a 18-h photoperiod at 22 °C day/8 °C night. Aerials tissues from 1-week-old seedlings or fully expanded but young leaves from approximately 1-month-old plants were flash-frozen into liquid nitrogen. Flash-frozen tissue (0.5 g) was ground in liquid nitrogen, suspended in nuclei isolation buffer (NIB, 10 mM MES-KOH (pH 5.4), 10 mM NaCl, 10 mM KCl, 0.1 mM spermine, 1 mM spermidine, 1 mM DTT and 1% Triton X-100 and protease inhibitors (Source)), filtered twice through 30-μm filters into ice-cold NIB + 1.8 M sucrose, washed twice following centrifugation at 1,000g for 15 min and resuspended in NIB. Libraries were prepared from serial dilutions in 50-μl reactions using a Nextera DNA Library Prep Kit (Illumina) with 2.5 µl TDE1 and a 30-min incubation at 37 °C, and amplified as described in ref. ^72^. Libraries were sequenced using paired 125-base reads on an Illumina HiSeq 2500 system (Illumina). ATAC-seq data analysis followed the ENCODE paired-end ATAC-seq pipeline v.2.2.2^73^ (10.5281/zenodo.7631189) with the following modifications: reads mapping to chloroplast sequences were also removed; no blacklist was used; read alignment with Bowtie2 used a maximum insert size of 1,000 bp (-X 1,000); and up to 10 alignments were reported (-k 10). Only unique alignments (primary alignments with MAPQ > 30) were used for subsequent analysis. Heatmaps and metaplots were generated using deepTools^58^.

SNP calling

The resequencing reads were mapped to the Hedin/2 v.2 reference genome using minimap2 v.2.24^60^ with default parameters. SAMtools v.1.16.1^74^ was used to convert mapping results into BAM format, and NovoSort v.3.06.05 (http://www.novocraft.com) was used to sort the BAM file, and BCFtools (v.1.8)^75^ was used to call SNPs and short indels with parameters -q 20 -Q 20–excl-flags 3332. Finally, indels and SNPs that were multiallelic or had >10% missing calls, >10% heterozygosity or minor allele frequency < 0.001 were removed. SNPs with minor allele frequency < 0.05 were further removed for phylogenetic tree structure, GWAS, LD decay, PCA and population structure analyses. SNP density and genetic diversity across each chromosome were counted with a 10-kb sliding window using a Perl script (https://github.com/helen5588/IPK_faba_codes/blob/main/s2_snp_calling/get_snp_stat.pl).

Population genetics analysis

We used the whole-genome SNPs to construct a maximum likelihood phylogenetic tree with 100 bootstraps using Phylip v.3.696^76^. iTOL (http://itol.embl.de) was used to color the phylogenetic tree. Admixture v.1.3^77^ was used to infer the population structure, with K values ranging from 2 to 10, and to run the cross-validation error procedure. To identify the pattern of LD among different faba bean accessions, pairwise squared correlation coefficients (r^2^) were calculated using PopLDdecay v.3.30^78^. Rscript v.3.5.1^79^ was used to plot average r^2^ values between all pairs of genotypes.

Selective sweep analysis

To identify differentiated loci between subpopulations, we performed a sliding window (10 kb) analysis based on pairwise estimates of genetic differentiation (FST)^80,81^ and genetic diversity (π ratio) within each population. The π and FST values were calculated with a Perl script (https://github.com/helen5588/IPK_faba_codes/blob/main/s5_sweep)^82^. The window size was 10 kb with a step size of 2 kb. To detect sweep signals of selection in the faba bean collection, we employed XP-CLR v.1.0, as described in ref. ^21^. The XP-CLR analysis was performed with a 50-kb sliding window and a 10-kb step between spring versus winter and normal VC type versus low VC type in the faba bean population.

Phenotyping

The faba bean samples (210 total; 207 spring-type and 3 winter-type) originated from breeding lines from Norddeutsche Pflanzenzucht Hans-Georg Lembke KG (NPZ). The 196 (194 winter-type and 2 spring-type) faba bean inbred lines (F > 9) were derived via single-seed descent from the Göttingen Winter Bean Population. Independently, 183 winter faba bean inbred lines were phenotyped for tolerance to freezing during winter (winter frost; WF^33,83^) and early spring (LF^33,84^) under controlled temperature conditions in n = 10 and n = 5 replicated climate chamber experiments. Visual frost stress symptoms were scored as loss of leaf (and stem) turgor and color on hardened or dehardened (WF/LF) juvenile plants during three successive frost stress treatments at −16/−13 °C, −18 °C/−15 °C and −19/−17 °C. Scores from 1 to 9 (not full symptoms) were summed to the total loss of turgor, loss of color, and grand sum (loss of turgor and color). Trait repeatability was high (0.77≤ h^2^ ≤0.92), and analysis of variance revealed significant genotypic variance among lines.

A subset of the ProFaba panel (208 of 239 accessions) was tested for winter hardiness at experimental station Reinshof, University of Göttingen, Lower Saxony, Germany (51.48° N, 9.93° E) in 2021–2022. The ProFaba panel and two check cultivars—Augusta (winter-type, NPZ) and Tiffany (spring-type, NPZ)—were tested in double rows in an unreplicated, randomized complete block design. Within the block, the n = 180 spring-type and n = 28 winter-type faba bean accessions of the panel were each arranged in a randomized complete block design to prevent neighboring effects later in the season. Checks were replicated 21 times each to estimate repeatability (h^2^ > 0.80). Winter survival and LF survival (number of surviving plants) were assessed after winter in February and after LF in April. The winter survival rate and LF survival rate were calculated in relation to emerged plants.

GWAS analysis

GEMMA v.0.98.5^85^ and BLINK^86^ were used for the association analysis. The first three PCA values (eigenvectors), which were derived from whole-genome SNPs, were used as covariates to correct for population stratification. The most highly significant SNPs associated with freezing tolerance were compared with the positions of Hedin/2 protein-coding genes. Orthologs of the gene-overlapping variants were identified using MCScanX^87^. We employed LDBlockShow v.1.40^88^ to generate and visualize the LD pattern of haplotype block.

Genomic-assisted prediction

We evaluated the ability of a genomic-enabled model to predict frost stress tolerance as loss of turgor, loss of color, and loss of turgor and color in our n = 183 winter-type faba bean lines using WF^33,83^ and LF^33,84^. The WF dataset was also used for a GWAS and is referred to here as the discovery set. Predictive ability was assessed by repeated random cross-validation, in which 60% of the lines were assigned to the training set and 40% to the testing set. In each of the 1,000 iterations, two gBLUP models were fitted to the data of the discovery set, and breeding values for the testing set were correlated with observed data from the LF dataset. The first model (M1) took the form $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bf{y}}={\bf{1}}\mu +Z{\bf{u}}+{\bf{e}}$$\end{document}$ , where y is the response vector, μ is the intercept with associated vector 1 of 1 s, u is the vector of breeding values with associated incidence matrix Z and e is the vector of residual effect. Vectors u and e are assumed to be $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bf{u}} \sim {\rm{N}}(0,\,G{\sigma }_{{\bf{u}}}^{2})$$\end{document}$ , where G is the genomic relationship matrix^89^, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bf{e}} \sim {\rm{N}}(0,\,{\sigma }_{{\bf{e}}}^{2})$$\end{document}$ , respectively. The second model (M2) incorporated the genotypes of two QTLs, detected by GWAS analysis on the discovery set, as fixed effects. The structure of this model was analogous to that of M1, except that the term $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bf{1}}\mu$$\end{document}$ was replaced by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X\beta$$\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X$$\end{document}$ represents the incidence matrix for fixed effects $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}$ . We performed cross-validation using the mixed.solve function of R package rrBLUP (v.4.6.3)^90^.

Frost treatments, RNA-seq and qRT–PCR

Wonjam1-ho was developed from PI 469181, a new variety with superior adaptability to the winter environmental conditions in South Korea. This variety was generated by irradiation with 100-Gy gamma rays using a Co gamma irradiator^61^ (150-TBq capacity; ACEL). The faba bean accessions were obtained from the USDA Agricultural Research Service. Briefly, 1,000 M_1_ seeds were planted in the experimental field at the Korea Atomic Energy Research Institute (35.5699° N 126.9772° E, Jeongeup, Jeollabuk, Republic of Korea) in 2015, and 2,000 M_2_ seeds were pooled for next generation. Subsequently, a total of 2,000 M_2_ plants were grown and harvested individually in 2017. M_3:5_ seeds were then propagated from single plants, and M_5:7_ plants were generated by line selection during 2020–2023. Frost tolerance was tested in the experimental field of the Korea Atomic Energy Research Institute with a faba bean mutant population consisting of 291 mutant lines during 2022–2023. Finally, we selected the 230-5 mutant variety with high frost tolerance named wonjam1-ho. For investigation of the transcriptome response for frost tolerance of Wonjam1-ho (tolerant) and PI 271634 from India (susceptible), both varieties were placed in 15-cm plastic pots with commercial soil (Hungnong) and grown in a growth chamber (Vision Science) set at 18 °C for 20 days. Next, the temperature in the growth chamber was gradually decreased at a rate of 1 °C per hour until it reached 4 °C, at which point it was held for 3 days (12-h day/12-h night). The temperature was further lowered by 1 °C per hour to −7 °C, maintained for 12 h (night), and then increased in reverse order to 18 °C for recovery. Leaves were collected after treatments (at 18 °C, 4 °C, 0 °C and −7 °C), immediately frozen in liquid nitrogen, and used for RNA extraction with three replicates (Supplementary Table 21). Raw fastq data were filtered using SOAPnuke v.2.1.8^91^, then mapped to the gene sequence using bowtie2 v.2.3.5.1^54^. The expression data for these samples were used to quantify gene expression using RSEM v.1.3.3^92^. The qRT–PCR experiment was conducted in five cold-susceptible accessions (PI 271634, PI 284338, PI 284341, PI 371797 and PI 567886) and five cold-tolerant accessions (Wonjam1-ho, PI 567891, PI 577721, PI 577722 and PI 577743). Leaf tissues were sampled under different temperature conditions, in triplicate. Samples were immediately frozen and ground in liquid nitrogen with a mortar and pestle. Total RNA was extracted using TRIzol reagent (Invitrogen) and treated with DNase I (Ambion). First-strand cDNA was synthesized using SuperScript III First-Strand Synthesis SuperMix (Invitrogen). Gene expression was analyzed on a CFX96 real-time PCR system (Bio-Rad) with iTaq Universal SYBR Green Supermix (Bio-Rad). All steps followed the manufacturers’ instructions. Relative expression was calculated according to the 2^−ΔΔCt^ method with cyclophilin (CYP2) as the reference gene. PCR conditions were 95 °C for 10 min, then 40 cycles of 95 °C for 15 s, 50 °C for 15 s, and 72 °C for 30 s. The primers were designed using Primer3plus (https://www.primer3plus.com/) (Supplementary Table 24).

Statistics and reproducibility

All statistical analyses were performed using R. To ensure appropriate statistical comparisons, data were first tested for normality. Phenotypic values were evaluated using the D’Agostino–Pearson omnibus K^2^ test to assess conformity to a Gaussian distribution. Phenotypic differences between two genotypes were evaluated with Welch’s t-test. For comparisons involving more than two groups, analysis of variance was performed, followed by suitable post hoc tests. The null hypothesis was rejected at P < 0.05. Sample sizes, precise P values, and the statistical tests used are indicated in the respective figures or figure legends.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-026-02524-y.

Supplementary information

Supplementary InformationSupplementary Figs. 1–13. Reporting Summary Peer Review File Supplementary Table 1. Supplementary Tables 1–24.

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1FAOSTAT, Food and Agriculture Organization of the United Nations (FAO), http://www.fao.org/faostat/en/#data/TP (2022).
2Windhorst, A. et al. Genome-wide association analyses identify QTL for tolerance to freezing during winter and early spring as a basis for in-depth genetic analysis and implementation in winter faba bean (Vicia faba L.) breeding. Preprint at bio Rxiv (2024).
3Bornhofen, E. et al. Genetics of faba bean yield and yield stability. Preprint at bio Rxiv (2024).
4Smit, A. F. A., Hubley, R. & Green, P. Repeat Masker Open-4.0. https://www.repeatmasker.org/ (2013–2015).
5Lee, J. et al. ENCODE-DCC/atac-seq-pipeline, Zenodo, 2.2.2, 10.5281/zenodo.7631189 (2023).
6Felsenstein, J. PHYLIP (Phylogeny Inference Package) Version 3.5c (University of Washington, 1993).
7R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2013).
8Windhorst, A., Skovbjerg, C. K., Andersen, S. U. & Link, W. Improving overwintering in times of climate change - a GWAS for late-frost tolerance of winter faba bean. In Proceedings of the 73rd Annual Conference of the Vereinigung der Pflanzenzüchter und Saatgutkaufleute Österreichs (Saatgut Austria), “New plant breeding techniques: myths, facts and reality”, 21–22 November 2022, Raumberg-Gumpenstein, Irdning, Austria 29–36 10.5281/zenodo.7875597 (2023).