Needle and Branch Trait Variation Analysis and Associated SNP Loci Mining in Larix olgensis
Ying Cui, Jiawei Yan, Luping Jiang, Junhui Wang, Manman Huang, Xiyang Zhao, Shengqing Shi

TL;DR
This study identifies genetic markers linked to needle and branch traits in Larix olgensis, aiding in its genetic improvement.
Contribution
The paper reports the first SNP loci associated with needle and branch traits in Larix olgensis using GWAS and KASP validation.
Findings
A total of 161 SNP loci were significantly associated with seven needle and one branch-related trait.
Twenty KASP markers were developed and validated for phenotypic variation in L. olgensis.
Three specific KASP markers showed polymorphism and were successfully amplified for trait association.
Abstract
Needles play key roles in photosynthesis and branch growth in Larix olgensis. However, genetic variation and SNP marker mining associated with needle and branch-related traits have not been reported yet. In this study, we examined 131 samples of unrelated genotypes from L. olgensis provenance trails. We investigated phenotypic data for seven needle and one branch-related traits before whole genome resequencing (WGRS) was employed to perform a genome-wide association study (GWAS). Subsequently, the results were used to screen single nucleotide polymorphism (SNP) loci that were significantly correlated with the studied traits. We identified a total of 243,090,868 SNP loci, and among them, we discovered a total of 161 SNP loci that were significantly associated with these traits using a general linear model (GLM). Based on the GWAS results, Kompetitive Allele-Specific PCR (KASP), designed…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7- —STI 2030–Major Projects
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Mapping and Diversity in Plants and Animals · Plant Reproductive Biology · Plant Molecular Biology Research
1. Introduction
Larix olgensis A. Henry is a tall deciduous conifer tree species and the main native tree species used for afforestation and timber production in Northeast China, with a significant ornamental, economic, and ecological value [1]. Research on L. olgensis has mainly focused on conventional breeding, which is characterized by a long breeding cycle and low efficiency [2,3,4]. Molecular breeding techniques are modern biological breeding technology that uses molecular biology technology to carry out animal and plant breeding at the molecular level, including molecular marker-assisted breeding technology and genetic engineering breeding technology. We have done this by adding emphasis in line 35, marked in red. Advancements in molecular biology have led to the development of molecular breeding techniques, which offer powerful tools for enhancing forest genetic improvement. These techniques employ genetic markers such as randomly amplified polymorphc DNA (RAPD), restriction fragment length polymorphism (RFLP), simple sequence repeats (SSR), and single nucleotide polymorphism (SNP) markers. By incorporating these techniques, the accuracy of genetic analysis and the effectiveness of breeding can be significantly enhanced [5,6]. With the development of high-throughput sequencing technologies, SNP markers distributed across the entire genome have emerged as the third generation of molecular markers. They have been widely used in major candidate gene mapping, molecular marker-assisted selection (MAS) breeding, cultivar fingerprinting, etc. MAS, which is based on the close linkage between molecular markers and target traits, has emerged as a prominent approach in the study of quantitative traits [7,8]. At present, there are few studies on the genetic underlying larch needle and branch traits, and molecular markers associated with needle and branch traits have not been developed.
Leaves are the main organs where plant photosynthesis, respiration, and transpiration occur and the main site for energy synthesis and metabolism [9]. Diversity in leaf morphology can reflect genetic diversity, but it is also an important clue to understanding the genetic variation of plants [10]. However, previous studies on “larch” populations mainly focused on ecological traits and metabolomics [11,12], and only a few studies have implemented genetic analyses and molecular markers. Leaf photosynthetic pigments (such as chlorophylls and carotenoids) can absorb, transfer, and transform light energy and are an important index to measure plant photosynthesis and environmental stress [13,14,15]. Chlorophylls, fat-soluble plant pigments, comprise two major compounds: chlorophyll a (Chl a) and chlorophyll b (Chl b), which are green pigment photoreceptors present in all photosynthetic organisms [16]. Carotenoids comprise plant pigment compounds with a wide range of colors that act as accessory pigments to chlorophylls in photosynthesis [17]. Therefore, to understand the growth and physiological traits of L. olgensis needles, it is necessary to develop molecular markers and provide a scientific basis to enhance the genetic improvement of larch.
Genome-wide association studies (GWAS) have contributed to substantial advances in crop and forest tree research [7,18]. However, in conifers, whose chromosomes are 2n = 24, the contribution of GWAS has been more limited due to their large genome size (~10–40 Gb), which presents a significant challenge in developing a sufficient number of markers [19]. Furthermore, the number of trees genotyped is insufficient, with a typical sample size of less than 500 individuals [20,21]. In recent times, a number of reference genomes and transcriptome assemblies have become accessible for a number of tree species, such as Picea abies [22], Pinus taeda [23], Picea glauca [24], and Pinus lambertiana [25]. These recent advancements have enabled GWAS based on exome capture [26], genotyping-by-sequencing (GBS) [27], SNP arrays from transcripts [28], and resequencing [29]. However, high-throughput SNP genotyping is not the optimal approach when the number of target SNPs is in the hundreds or less due to its inappropriateness and lack of cost-effectiveness. A relatively low-cost genotyping approach, such as the Kompetitive Allele-Specific PCR (KASP) assay, is to be preferred in such cases. The KASP assay is a new genotyping method based on allele-specific amplification and high-sensitivity fluorescence detection. KASP-based genotyping is characterized by low cost, high throughput, and accurate double-allele genotyping of SNP and insertions-deletion (InDel) loci through specific matching of primer terminal bases. The method is widely used in the MAS selection of soybean, wheat, and other plants [30,31]. However, no study has been performed on the genome-wide identification of KASP markers with high polymorphism among resource materials or varieties that are capable of background screening in L. olgensis.
In this study, 131 L. olgensis germplasm resources were used as materials. We analyzed the variation of L. olgensis needle and branch traits and identified associated genetic polymorphisms through phenotypic and GWAS analyses. We developed corresponding KASP molecular markers corresponding to key SNP loci, and by detecting these loci, we can understand the phenotypic variation of plants and reduce the workload of breeding. This study provides a valuable reference for the genetic improvement of L. olgensis.
2. Results
2.1. Variation Analysis of Phenotypic Data
In this study, the variation in growth traits among 131 genotypes of L. olgensis was assessed (Table 1). The average needle length (NL) was 1.45 cm, with a range of 1.01 to 2.16 cm and a coefficient of variation (CV) of 15.17%. The average needle water content (NWC) was 55.19%, with a range of 38.28 to 71.21, and a CV of 8.73%. The average needle fascicles (NF) number was 9, with a range of 5 to 16, and a CV of 24.78%. The average biennial branch length (BBL) was 6.83 cm, with a range of 3.01 to 14.37, and a CV of 28.84%. The average Chl a relative content was 0.97%, with a range of 0.32 to 1.42, and a CV of 20.62%. The average Chl b relative content was 0.38%, with a range of 0.16 to 0.55, and a CV of 18.42%. The average Chl (a+b) relative content was 1.23%, with a range of 0.44 to 1.80, and a CV of 20.33%. The average Car relative content was 0.11%, with a range of 0.03 to 0.19, and a CV of 18.18%.
Pearson’s correlation analysis was performed on needle and branch-related traits, and a cluster heat map of Pearson’s correlation coefficients among eight traits was constructed (Figure 1). In terms of needle and branch phenotypes, NF was positively correlated with the BBL (p < 0.01, r = 0.76), and NF was positively correlated with NL (p < 0.01, r = 0.29). Regarding photosynthetic pigment contents, there was a highly significant positive and significant correlation between Chl a, Chl b, Chl (a+b), and Car. The correlation coefficient between Chl a, Chl b, and Chl (a+b) was above 0.9. The Shapiro–Wilk test showed that the needle and branch traits also showed normal or near normal distributions (Figure 2), suitable for GWAS.
2.2. Analysis of Resequencing Data and SNP Screening
The resequencing raw data generated from the 131 genotypes was 116,699.59 Gb, and the filtered clean bases were 16,441.49 Gb, with an error rate of 0.02%. The proportion of Q20 bases was 98.69%, and the proportion of Q30 bases was 95.43%, indicating that the constructed library quality met the requirements for subsequent analysis of the resequenced samples. Ten thousand sequences were randomly selected from the fastq file of each sample, and blastn was used to compare the sequences to the NCBI NT database for contamination assessment. The results showed no significant contamination with sequences from other species in the sample sequences. Subsequently, the resequencing data from the 131 genotypes were compared with the reference genome of L. olgensis. The total sequencing data was 843,396,137 bp, the aligned sequences 841,468,984 bp, and the alignment rate was 99.77%. Moreover, the average sequencing depth of the samples was 9.28, and the coverage range was 85.29%, indicating that the sequencing data largely covered the reference genome and could be utilized for further analyses (Table 2).
A total of 1,157,702,025 SNPs were screened through rigorous filtering during the detection of SNP loci, then 243,090,868 SNPs were detected using VCFtool, with a genotype call rate ≥ 90%, MAF ≥ 0.05 and dimorphic SNP loci (detailed data not published, which was used for the 60K SNP array construction).
2.3. GWAS Analysis of Needle and Branch Traits
In this study, we integrated the phenotypic results for NL, NWC, NF, BBL, Chl a, Chl b, Chl (a+b), and Car with the WGRS data. We used a GLM to conduct GWAS employing the GAPIT package in R and created Manhattan plots and QQ plots representing the associated indicators (Supplementary Figure S1; Figure 3). A total of 161 SNPs were highly associated with seven needle and one branch-related traits in L. olgensis of which 153 SNP loci were associated with relative needle traits and eight SNP loci were associated with relative branch traits. SNPs with a significant association with needle and branch traits are detailed in Supplementary Table S1.
2.4. Validation Candidate SNPs by KASP Assay
Based on the results of the GWAS analysis, the 150 bp flanker sequence of the identified loci showing significant trait associations was extracted. The sequences of these loci were compared with the reference genome. Sequences with high copy numbers, high GC content, and many repeat sequences were removed (Figure 4), and 20 primers were designed and synthesized (Supplementary Table S2). 20 KASP markers linked to needle and branch traits were used for genotyping L. olgensis, and the genotyping results of the markers are shown in Figure 5. In the population samples, 11 markers were polymorphic, whereas 6 markers did not show any polymorphism.
2.5. Development of KASP Markers
The polymorphic KASP primers were employed to assess associations of genetic variants with BBL in the population materials. Finally, an informative KASP marker BSBM01000635.1_4693780 was identified. The natural population, composed of 86 L. olgensis genotypes, was assessed, and a marker could be used to classify 82 L. olgensis genotypes, while the remaining 4 genotypes were not classified (N/N). BSBM01000635.1_4693780 differentiated 74 L. olgensis individuals with the genotype G/G and 8 with the genotype G/A (Figure 6).
The polymorphic KASP primers were also employed to assess genetic variants associated with Car content in the population Finally, 2 informative KASP markers BSBM01000114.1_5114757 and BSBM01000114.1_5128586, were identified. The natural population composed of 86 L. olgensis genotypes was assessed. The BSBM01000114.1_5114757 marker could be used to classify 75 L. olgensis, while the remaining 11 samples were not classified. It differentiated 58 L. olgensis individuals with the genotype GG, 5 with GT and 12 with TT (Figure 7A). The BSBM01000114.1_5128586 marker could be used to classify 86 L. olgensis. It differentiated 79 L. olgensis individuals with the genotype GG, 5 with genotype GA and 2 with the genotype AA (Figure 7B).
3. Discussion
Plant phenotypic variation is the result of the interaction between genetic diversity and the environment, and it also manifests in plant adaptation to the environment [32]. Studying plant phenotypic diversity can help understand the size of genetic variation in plant populations and also help in understanding the mode, mechanism, and influencing factors of plant adaptive evolution [33]. Perennial tree species have abundant phenotypic and genetic variation, which determines their adaptability to the environment and is the basis for maintaining the long-term stability of the forest ecosystem [34]. In this study, we determined the extent of genetic variation in 131 L. olgensis genotypes. Two traits varied greatly in the population, with a coefficient of variation greater than 20%. It is possible that the needle traits have been differentially influenced by complex climatic conditions, resulting in apparent differences in needle-related traits among the populations [35]. Chlorophyll plays a dominant role in photosynthesis, reflecting the plant’s ability to utilize and regulate light energy. In contrast, carotenoids play a secondary role as auxiliary pigments, which absorb visible light, after which light energy is transferred to chlorophyll, further improving the photosynthetic efficiency [36]. Plants with a greater chlorophyll content possess a more potent capacity to absorb light energy for the process of photosynthesis. The relatively high content of photosynthetic pigments in this study reflects the strong photosynthetic capacity of L. olgensis to a certain extent [37].
In the current study, we analyzed larch genotypes that are suitable for growing in northeast China. Such association studies have not previously explored the needle and branch-relates traits of L. olgensis. Therefore, our results may represent a promising and valuable resource of excellent loci associated with growth traits. However, the genomes of coniferous species are large and complex [22,23,24,25], with the L. kaempferi genome assembly at 10.97 Gb [38]. The development of molecular markers using traditional methods has a large and complex workload and is a lengthy process, while the number of markers obtained is not sufficient for the required sensitivity in association analyses [39]. Plant breeding techniques developed by leaders in the field are based on MAS approaches. Prior to MAS, the initial step involves the identification of DNA marker loci in the genome of forest trees, which are linked to specific wood traits. Variation in a limited number of genes can often lead to substantial phenotypic alterations [38]. SNP molecular markers are widely used for the construction of genetic maps, quantitative trait mapping analysis, and GWAS [40]. With the publication of the genomes of multiple species, the WGRS results can be compared with the existing reference genome sequences to identify genetic variations such as SNPs, InDels, and structural variants (SV) in the whole genome [21,41]. In coniferous trees, an SNP genotyping array developed by resequencing successfully genotyped 480 individuals [42]. Gulyaev et al. used more than 1 TB of WGRS data from 70 Salix taxa to identify SNPs on the autosomes and the chloroplast genomes for tree species phylogenetic analyses and to identify variants associated with different sex-determination systems in major groups of the genus [43].
A large number of SNP loci can be identified in a relatively small number of genotypes in conifers [44]. De la Torre et al. [45] identified 799 significant associations with cold tolerance-related traits by GWAS in 217 genotypes in Douglas-fir. The GoldenGate assay was used to genotype the offspring from three-generation outbred (G2) and inbred (F2). Based on 98 markers segregating in both pedigrees, a consensus map containing 357 SNPs from 292 different loci was generated [46]. In this study, the studied populations were mostly composed of individuals representing a species with a narrow distribution range. We selected 131 unrelated genotypes representing different natural forest populations that could represent the core germplasm of the northeastern part of China as determined by their limited genetic similarity. The results of the GWAS showed that SNP loci were significantly associated with abundant phenotypic variation. Thus, the results obtained demonstrate that sample size is not the most important factor in GWAS, and the genetic relationships among samples should be the focus of sample selection. Therefore, samples representing the core germplasm resources can be used as the materials for association analysis of the quantitative traits even when the availability of genetic materials is limited [47].
Currently, the assembly quality of the larch genome is inadequate, and the corresponding functional genes cannot be obtained [48]. KASP marker technology is of significant importance and has a wide range in various fields. It is extensively used for multiple purposes, including the identification of germplasm, the investigation of genetic relationships, the facilitation of breeding through molecular markers, the construction of genetic maps, and the mapping of genes [49,50,51]. KASP markers have been applied to locate candidate genes for yield traits such as plant height and thousand-grain weight [18]. In the current study, we developed three KASP markers BSBM01000635.1_4693780, BSBM01000114.1_5114757, and BSBM01000141.1_5128586, associated with BBL and Car. The development of KASP markers related to needle and branch traits of L. olgensis is of great practical significance for molecular marker breeding of L. olgensis.
4. Materials and Methods
4.1. Plant Materials
The population for the phenotype-genotype association analyses was derived from L. olgensis provenance trials established in Cuohai county, Heilongjiang province (122°51′ E, 47°27′ N) in 1980, which were originally distributed at 11 sites in the sites of Jilin and Heilongjiang provinces (Table 3). Seedlings were planted using a randomized complete block design with a density of 1 × 2 m [52]. In this study, representative individuals were selected based on the variation in needle and branch traits and whole genome resequencing (WGRS).
4.2. Phenotypic Traits Determination
In July 2023, which is the period of active growth, the traits of current-year needles and branches were not fully developed; thus, the traits of needles on biennial branches developed in the previous year were measured. The needle and biennial branch lengths were determined from each genotype using vernier calipers (each sample comprised 15 biological replicates) [53]. The number of needle fascicles on biennial branches was counted (each sample comprised 15 biological replicates). The fresh weight of the needles was weighed using an analytical balance. The measured needles were placed in a paper bag, sterilized at 105 °C for 15 min, and then dried at 85 °C in an oven to constant weight. The needles were placed in a dryer and cooled to room temperature. The dry weight of the needles was measured using an analytical balance, and the needle water content was calculated simultaneously (each sample comprised five biological replicates) [54]. Referring to the method of Cai et al., the needle water content was calculated as follows:
The chlorophyll a (Chl a), chlorophyll b (Chl b), Chlorophyll total (Chl (a+b)), and carotenoid were determined by spectrophotometry (HD-UV90, China). The concentration of Chl a, Chl b, Chl (a+b), and Car were calculated according to Lichtenthaler [55] (Lichtenthaler, 1987):
where A is the absorption at the corresponding wavelength; Chl a is the chlorophyll a concentration; Chl b is the chlorophyll b concentration; Chl (a+b) is the total chlorophyll concentration; Car is the total carotenoid concentration. Pigment concentrations were expressed as μg mL^−1^ of diluted extract.
4.3. DNA Extraction
In July 2023, the genomic DNA was extracted from L. olgensis needles using the magnetic beads method [56]. The DNA quality and concentration of DNA were assessed using 1.0% agarose gel electrophoresis and an ND-1000 spectrophotometer, respectively. After the extraction quality was determined, the DNA solution was diluted to 20–100 ng/μL used as the working solution and stored at −20°C for subsequent detection analyses.
4.4. WGRS and SNP Calling
The DNAs of 131 L. olgensis genotypes were resequenced using the DNBSEQ-T7 platform (MGI, Shenzhen, China) with an expected target coverage of 10× in Huazhi Bio-Tech (Changsha, China). The raw data were filtered using the fastp software (v2.20.0) (parameter Q30, the rest were the default) to obtain clean read data [57]. The clean reads were then aligned to the L. kaempferi reference genome (GCA_027924585.1) using the Burrows-Wheeler Aligner (BWA) software (version BWA-0.7.17(r1188)) [38]. SNP-calling was performed using the genome analysis toolkit software (GATK4.3.0.0) [58]. Finally, raw SNPs were filtered using the VCFtools software (v0.1.15) [59] based on the following criteria: deletion call rate ≥ 0.9, minor allele frequency (MAF) ≥ 0.05, loci with only two alleles were retained, and information in all INFO field were retained without filtering. The high-quality SNPs obtained through these filtering criteria were used for GWAS analysis.
4.5. Genome-Wide Association Analysis
GWAS was performed using the R package rMVP (version 1.1.1), which employs a general linear model (GLM) [60]. The SNP loci significantly associated with the target traits were determined based on −Logp values ≥ 8 as the threshold. The QQ map and the Manhattan map were drawn using the CMPlot (version 4.5.1) of the R package. The Manhattan plot was plotted using rMVp to visualize the GWAS analysis.
4.6. KASP Genotyping Assay
Using the Primer-BLAST function of NCBI (https://www.ncbi.nlm.nih.gov/, accessed on 19 April 2024), KASP-PCR amplification primers were designed based on the SNP loci. Twenty of the designed primers were significantly associated with L. olgensis BBL, Chl (a+b), and Car. Each pair of primers consisted of two specific forward primers F1 and F2, and a generic reverse primer, R F1 and F2, which contained 6-carboxyfluorescein (FAM) and hexachloro-6-methylfluorescein (HEX) fluorescent linker sequences (underlined), respectively.
KASP labeling validation was performed on 96 samples in Douglas Scientific’s Array Tape system. The temperature cycling conditions were predenaturation at 94 °C for 15 min, followed by 30 cycles of denaturation at 95 °C for 20 s, extension at 65–56 °C for 1 min, with a decrease of 0.8°C per cycle for 10 cycles, denaturation at 94 °C for 20 s, and extension at 57 °C for 1 min.
4.7. Statistical Analysis of Needle Traits
IBM SPSS Statistics 26 software was used to calculate the mean value, standard deviation, and coefficient of variation (CV) of each measured trait. The Origin 2021 software was used to determine the frequency distribution and correlation analysis and to generate the figures.
5. Conclusions
This study identified 161 SNP loci associated with the seven-needle and one-branch-related traits in L. olgensis. Genotypic and phenotypic variability was combined to conduct GWAS, and KASP markers were developed based on the significantly associated loci. These significant KASP markers could be used to genotype L. olgensis individuals accurately. In conclusion, this work has enriched the phenotypic and genotypic data of L. olgensis and provided valuable information for further studies on the regulation of needle and branch traits in L. olgensis. Additionally, further research is needed to explore these loci and gain a deeper understanding of the underlying molecular mechanisms in regulating the needle and branch traits of L. olgensis. These findings contribute to a better understanding of the regulatory mechanisms of coniferous tree needle and branch-related traits, offering a scientific basis for optimizing larch’s growth traits. This study provides valuable genetic resources and a solid theoretical basis for molecular design breeding and marker-assisted breeding of L. olgensis. It can improve the yield of L. olgensis with excellent wood properties, reduce the workload of breeders, and accelerate the breeding process of trees.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Zhang H. Zhou X. Gu W. Wang L. Li W. Gao Y. Wu L. Guo X. Tigabu M. Xia D. Genetic stability of Larix olgensis provenances planted in different sites in northeast China For. Ecol. Manag.202148511898810.1016/j.foreco.2021.118988 · doi ↗
- 2Ying J. Weng Y. Oswald B.P. Zhang H. Variation in carbon concentrations and allocations among Larix olgensis populations growing in three field environments Ann. For. Sci.2019769910.1007/s 13595-019-0877-0 · doi ↗
- 3Pan Y. Li S. Wang C. Ma W. Xu G. Shao L. Zhao X. Jiang T. Early evaluation of growth traits of Larix kaempferi clones J. For. Res.2018291031103910.1007/s 11676-017-0492-6 · doi ↗
- 4Teodosiu M. Mihai G. Ciocîrlan E. Curtu A.L. Genetic characterisation and core collection construction of European larch (Larix decidua Mill.) from seed orchards in Romania Forests 202314157510.3390/f 14081575 · doi ↗
- 5Jadwiszczak K.A. Mazur M. Bona A. Marcysiak K. Boratyński A. Three systems of molecular markers reveal genetic differences between varieties sabina and balkanensis in the Juniperus sabina L. range Ann. For. Sci.2023804510.1186/s 13595-023-01211-w · doi ↗
- 6Nunziata A. Ruggieri V. Petriccione M. De Masi L. Single Nucleotide Polymorphisms as Practical Molecular Tools to Support European Chestnut Agrobiodiversity Management Int. J. Mol. Sci.202021480510.3390/ijms 2113480532646057 PMC 7370276 · doi ↗ · pubmed ↗
- 7Zhu X. Sun F. Sang M. Ye M. Bo W. Dong A. Wu R. Genetic architecture of heterophylly: Single and multi-leaf genome-wide association mapping in Populus euphratica Front. Plant Sci.20221387087610.3389/fpls.2022.87087635783952 PMC 9240601 · doi ↗ · pubmed ↗
- 8Ahmar S. Ballesta P. Ali M. Mora-Poblete F. Achievements and challenges of genomics-assisted breeding in forest trees: From marker-assisted selection to genome editing Int. J. Mol. Sci.2021221058310.3390/ijms 22191058334638922 PMC 8508745 · doi ↗ · pubmed ↗
