Integrative Genome-Wide Association and Transcriptome Analyses Identify Candidate Genes for Salt Tolerance During Cotton Germination
Yin Wang, Yilei Long, Shen Jin, Yinan Yang, Shixiao Fang, Xiutong Wu, Teng Liu, Xiantao Ai

TL;DR
Researchers used genome and gene activity data to find genes linked to salt tolerance in cotton during germination.
Contribution
Integration of genome-wide association and transcriptome data identified candidate genes for salt tolerance in cotton germination.
Findings
1277 SNP markers were identified and mapped to 94 QTLs related to salt tolerance traits in cotton.
73 candidate genes were identified, with three being differentially expressed under salt stress.
Expression profiles of three genes showed significant variation across time points under salt stress.
Abstract
Genome-wide association analysis and transcriptomics were used to investigate salt tolerance traits during germination in 300 Gossypium hirsutum L. germplasm accessions, with the objective of identifying genes and molecular markers associated with salt tolerance. Under 200 mmol L−1 NaCl stress, six traits were evaluated, germination rate, root length, shoot length, root fresh weight, shoot fresh weight, and total fresh weight, as well as their respective salt tolerance indices. A total of 1277 significantly associated single-nucleotide polymorphism (SNP) markers were identified and mapped to 94 quantitative trait loci (QTLs). Of these, 49 QTLs were detected by three or more analytical models, and three QTLs were prioritized for further investigation. Subsequent analysis of these QTLs identified 73 candidate genes potentially involved in cotton salt tolerance. Integration of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10- —Regional Science Fund of the National Natural Science Foundation of China
- —Tianshan Elite Program, Young Elite Talent Project—Young Science and Technology Innovation Talent
- —Major Science and Technology Special Project of Xinjiang Uygur Autonomous Region
- —Key R&D Special Project of the Autonomous Region
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch in Cotton Cultivation · Plant Stress Responses and Tolerance · Plant Molecular Biology Research
1. Introduction
Soil salinization represents a significant barrier to agricultural development. The intensification of this process, driven by global climate change and human activities, has accelerated the depletion of arable land resources [1]. Crucially, seed germination represents the most critical bottleneck for successful crop establishment, as it marks the sensitive transition from a quiescent state to active growth. In saline and drought-prone environments, the adaptability of seeds during this early phase is the primary determinant of plant survival [2,3]. Saline–alkali stress severely impairs plant growth, development, and cellular metabolism [4] by disrupting ionic permeability and osmotic balance. For cotton (Gossypium), which is frequently cultivated in regions characterized by high soil salinity, resistance during the germination and seedling stages is particularly vital yet constitutes the weakest link in its life cycle [5,6]. High concentrations of soluble salts decrease soil osmotic potential, significantly impairing water uptake by seeds and roots [7]. This disruption of ion homeostasis interferes with essential metabolic processes and substantially reduces root vitality and photosynthetic capacity [8]. While cotton’s salt tolerance typically increases after the three-leaf stage [9], the damage sustained during germination directly influences the population density, subsequent vegetative growth, and ultimate fiber quality [10]. Therefore, identifying cotton germplasm with robust salt tolerance during the germination stage is a prerequisite for enhancing crop resilience and securing yields in marginal lands.
Salt stress is among the most detrimental abiotic factors affecting plant productivity. Globally, saline–alkali soils cover approximately 7% of terrestrial land, and in China, the area of saline–alkali land exceeds 100 million hectares and continues to expand [11]. Under high-salt stress, plants have evolved complex regulatory networks to enhance survival [12]. Plants growing in saline–alkali soils experience severe ionic stress, which disrupts normal ion distribution and undermines water potential homeostasis [13]. Cotton (Gossypium) is the world’s most significant natural fiber crop and a cornerstone of the global textile industry, possessing immense strategic and economic value. As a moderately salt-tolerant species, cotton plays a vital role in utilizing marginal lands and maintaining ecological stability in arid regions [14]. In China, cotton cultivation remains a national priority; according to the National Bureau of Statistics, the planting area in 2025 reached 2979.2 thousand hectares, with a total output of 6.641 million tons—a 7.7% increase over 2024. Notably, the Xinjiang region now accounts for 90.2% of China’s total cotton production [15]. underscoring the crop’s concentration in areas characterized by saline–alkali soils. While the deployment of insect-resistant and nitrogen-efficient varieties has enhanced the crop’s environmental sustainability [16], excessive salt stress remains a primary constraint. Recent studies establish a salinity threshold of 7.7 dS m^−1^ for cotton [17,18]; beyond this point, yields decline sharply, with a 0.5% soil salt content potentially leading to a 50% reduction in productivity [19]. Consequently, deciphering the mechanisms of salt tolerance during the sensitive germination stage is essential for securing global cotton supplies and optimizing land-use efficiency.
Genome-wide association studies (GWASs), which rely on linkage disequilibrium (LD), employ high-density molecular markers to identify loci associated with phenotypic variation, thereby elucidating the genetic basis of complex traits [5]. Since Risch first introduced the concept [20], GWASs have demonstrated reliability in identifying candidate genes across various species [21,22]. In recent years, GWASs have been extensively applied to crops such as rice [23], soybean [24], and maize [25]. The refinement of cotton genetic maps and advancements in statistical analysis software have facilitated the broader application of GWAS in cotton research [26]. Salt tolerance in cotton is a polygenic quantitative trait, and candidate genes can be effectively identified through quantitative trait locus (QTL) mapping. For instance, Yuan et al. [27] conducted GWAS on 196 upland cotton accessions to identify QTLs and candidate genes associated with salt tolerance during germination, reporting 33 significant single-nucleotide polymorphisms (SNPs), including 17 detected in at least two environments. While these foundational studies provide valuable insights, numerous GWASs have predominantly focused on terminal traits such as yield and fiber quality. Consequently, the genetic architecture of salt tolerance specifically during the cotton germination stage remains insufficiently characterized. Furthermore, previous efforts have largely relied on single-omics frameworks. By integrating GWAS with high-resolution transcriptomic data, our study advances beyond conventional single-method approaches. This integrative multi-omics strategy provides a more robust analytical framework to pinpoint candidate genes and elucidate the underlying molecular networks governing early-stage salt tolerance.
Although GWAS enables robust genomic localization, it often lacks the resolution to identify specific causal genes within large QTL intervals. Specifically, SNPs identified through GWAS often reside within broad linkage disequilibrium (LD) regions, making it challenging to distinguish whether a specific SNP is functionally causal or merely associated with the true causative locus. Furthermore, a single LD interval may encompass multiple genes, which often leads to potential false-positive results. Transcriptome analysis (RNA-seq) serves as a powerful complementary approach. Compared with traditional sequencing methods, second-generation transcriptomic sequencing provides higher sensitivity, greater throughput, and the ability to detect novel genes and alternative splicing events [28,29]. Rather than acting merely as a confirmatory tool, integrating transcriptional profiling with GWAS serves as a rigorous prioritization strategy. By determining the expression levels of genes within associated loci, this approach filters out non-responsive candidates within LD blocks and narrows the scope for biological validation, thereby significantly enhancing the accuracy of gene identification.
Consequently, integrating GWAS and transcriptomics has become an effective strategy for the genetic dissection of complex traits. Applying this combined approach to identify salt-tolerant genes and design molecular markers is crucial for enhancing cotton’s resilience. For instance, Xu et al. [30] conducted GWAS on 217 upland cotton varieties and, by incorporating transcriptomic data, identified three candidate genes associated with salt tolerance. Similarly, He et al. [31] combined these methods to identify 8 candidate genes associated with salt stress in alfalfa (Medicago sativa L.). Kim et al. [32] also utilized this integrated approach to predict key transcription factors (e.g., OsMYB48) associated with salt tolerance in rice seedlings. Beyond identification, high-confidence candidate genes derived from such multi-omics integration serve as optimal targets for CRISPR/Cas9-based functional validation. By developing knockout or overexpression models, the biological functions of these genes can be rapidly characterized. Furthermore, these functionally validated genes provide a theoretical basis for precision breeding, enabling the targeted enhancement of salt-resilience while preserving the elite genetic backgrounds of existing cultivars. This holistic pipeline, from locus discovery to potential application, significantly strengthens the translational value of genomic research in cotton.
This study utilized 300 landrace cotton germplasm accessions as experimental materials to investigate the genetic basis of salt tolerance. To address the previously identified knowledge gaps, our research was designed with the following specific objectives: (1) to characterize the phenotypic variation and stress response patterns in salt tolerance during the critical cotton germination stage; (2) to identify significant genomic loci and associated SNPs through a high-resolution GWAS; and (3) to prioritize high-confidence candidate genes by integrating genetic mapping findings with transcriptomic data. Collectively, these investigations aim to elucidate the molecular mechanisms underlying cotton salt tolerance and provide a robust theoretical foundation for the development of salt-resilient cotton cultivars through precision breeding.
2. Results
2.1. Phenotypic Analysis
A gradient stress test on 16 randomly selected cotton accessions revealed that the Salt Injury Index (SII) for six traits approached 0.5 under 200 mmol L^−1^ NaCl (Figure A1); thus, this concentration was selected for subsequent experiments. Salt Tolerance Indices (STIs) were calculated for the six traits and normalized using the membership function method. The average membership function value was used as the overall criterion for evaluating salt tolerance. Based on these values, the 300 germplasm accessions were classified into five categories: Highly Salt-Tolerant (HST, 48 accessions), Salt-Tolerant (ST, 169), Moderately Salt-Tolerant (MST, 40), Low Salt-Tolerant (LST, 16), and Non-Salt-Tolerant (NST, 27) [33]. This classification was subsequently utilized in downstream analyses to evaluate the enrichment and distribution of specific candidate gene haplotypes across accessions with differing salt tolerance capacities.
Descriptive statistics (Table 1) showed that the coefficients of variation (CV) for these traits ranged from 31.93% to 48.52% under salt stress, indicating substantial phenotypic variation among the accessions. Normality tests (Figure 1) indicated that most traits followed a normal distribution (absolute skewness and kurtosis < 1), except for Germination Rate (GR) and Root Fresh Weight (RFW), which deviated from normality. These findings suggest that osmotic stress differentially regulates salt tolerance traits. While GR and RFW deviated from normal distributions, the General Linear Model (GLM) and Mixed Linear Model (MLM) employed in our GWAS are generally robust to such moderate non-normality, mitigating potential biases in the association results. Correlation analysis (Figure 2) revealed positive correlations among all traits. Total Fresh Weight (TFW) and Shoot Fresh Weight (SFW) exhibited the highest correlation coefficient (0.99), followed by the correlation between the STIs of TFW and SFW (0.99). The weakest correlation was observed between the STI of SFW and GR (r = 0.42).
2.2. Genome-Wide Association Analysis of Salt Tolerance-Related Traits in Cotton
Understanding population structure is critical for inferring the origin, differentiation, and gene flow among subpopulations. ADMIXTURE analysis was used to construct the population genetic structure. The cross-validation (CV) error was minimized at K = 6, yielding six subpopulations from the 300 cotton accessions. Specifically, the six subpopulations (Subpopulations 1 to 6) consisted of 57, 36, 28, 85, 69, and 25 accessions, respectively. Analysis of the geographical distribution indicated a correspondence between the population structure and the geographic origins of the accessions. Subpopulation 1 was primarily concentrated in the Northwest Inland region, Subpopulation 2 in the Yangtze River basin, and Subpopulation 6 in the Liaoning specific early-maturing region. The remaining subpopulations (3, 4, and 5) were predominantly located in the Yellow River basin. This classification was further corroborated by phylogenetic analysis and PCA. Based on PopLDdecay, the maximum r^2^ was 0.86, and the linkage disequilibrium (LD) decay distance (where r^2^ drops to half) was calculated as 454.6 kb [34]. This LD decay distance is shorter than the 742.7 kb reported by Ma et al. [35] and the 1000 kb reported by Fang et al. [36], but longer than the 296 kb reported by Wang et al. [37].
GWAS was conducted for salt tolerance traits and indices using four models (GLM+P, GLM+Q, MLM+P+K, and MLM+Q+K) based on the identified effective SNPs (Figure 3). A total of 1277 significant SNPs were identified across 26 chromosomes, including 1111 for traits and 166 for indices. Based on the genomic LD decay distance of 454.6 kb, these SNPs were integrated into 43 QTLs for traits (explaining 6.71–11.88% of phenotypic variation, PVE, with an average of 8.03%) and 51 QTLs for indices (explaining 6.73–9.81% PVE, with an average of 7.38%). These moderate PVE values are consistent with the polygenic control of salt tolerance. Furthermore, they align with the typical effect range reported for salt tolerance QTLs in cotton by Yuan et al. [27] (5.98% to 10.76% PVE, with an average of 8.21%).
To evaluate the robustness of these associations, we analyzed the detection frequency of the 94 identified QTLs across the four models. Specifically, 22 QTLs were uniquely detected by a single model, and 23 QTLs were shared across two models. Loci consistently detected by three or more models were explicitly defined as “stable QTLs”. Among the total, 49 stable QTLs were detected by at least three models (27 for traits and 22 for indices). As shown in Figure 4, these stable QTLs were distributed across most chromosomes, with the exception of A04, A09, A11, A12, D05, D07, D08, D09, and D12. Chromosome A03 contained the highest number of stable QTLs (7), followed by A06 (6), while A01, A02, A13, D03, D11, and D13 each contained one QTL. Specifically, stable QTLs associated with salt tolerance indices were most abundant on chromosome A03 (5), whereas those associated with salt tolerance traits were most abundant on chromosome D01 (4).
2.3. Candidate Gene Prediction
To identify candidate loci for salt tolerance, we prioritized QTLs from the 49 stable QTLs based on the following explicit selection criteria: (1) consistent detection across multiple GWAS models; (2) co-localization, where the same lead SNP is associated with multiple phenotypic traits; and (3) a high phenotypic variance explained (PVE). Applying these criteria, we focused on 9 prioritized QTLs where multiple traits were associated with the same lead SNP (Table 2). Specifically, snp517602 was associated with qSGR-Gh6.2, qSSL-Gh6.3, and qSTFW-Gh6.3; snp1470334 with qSFW-Gh10.1, qSL-Gh10.2, and qTFW-Gh10.1; and snp2322113 with qSFW-Gh19.1, qSL-Gh19.1, and qTFW-Gh19.1. For subsequent candidate gene prediction, the QTL with the highest PVE at each of these three loci was selected as the representative trait, which were qSSL-Gh6.3, qSFW-Gh10.1, and qSL-Gh19.1, respectively.
For qSSL-Gh6.3 on chromosome A06 (Lead SNP: snp517602, PVE 7.94%), the local Manhattan plot (Figure 5b) peaked at 12.2–13.1 Mb. LD heatmap analysis (Figure 5d) narrowed the candidate region to 200 kb, containing three genes (Table A1). Haplotype analysis showed that accessions carrying the TT haplotype exhibited a significantly higher STI for Shoot Length compared to those with the CC or CT haplotypes. Moreover, frequency distribution analysis indicated that the TT haplotype accounted for the highest proportion in salt-tolerant categories, with the proportion decreasing as salt tolerance declined (Figure 5a,c).
For qSFW-Gh10.1 on chromosome A10 (Lead SNP: snp1470334, PVE 8.43%), the peak was at 103.4–104.3 Mb. The 200 kb region contained 26 genes (Table A1). Accessions carrying the CC haplotype exhibited a significantly higher Shoot Fresh Weight, and distribution analysis showed this haplotype was most prevalent in salt-tolerant categories (Figure 6a,c).
For qSL-Gh19.1 on chromosome D06 (Lead SNP: snp2322113, PVE 7.81%), the 150 kb region contained 44 genes (Table A1). Accessions carrying the CC haplotype exhibited significantly longer shoots, with the CC haplotype showing the highest proportion in salt-tolerant categories (Figure 7a,c).
2.4. Transcriptome Data Quality Analysis
It is important to note that the transcriptome analyses in this study were conducted exclusively on a single genotype and tissue. Five days post-germination, seedlings of the salt-tolerant accession CN4 (Zhongmian No. 4) were treated with 200 mmol L^−1^ NaCl or distilled water (control). Root tissues were collected at 1, 6, and 24 h with three biological replicates. Sequencing of the 18 libraries yielded 40.9–58.9 million clean reads per sample, with clean bases exceeding 6.13 Gb and an average GC content of 43.1%. The Q30 values for all samples were >96.49%, well above the standard threshold of 80%, demonstrating the high accuracy and reliability of the sequencing data. Furthermore, at least 95.87% of the clean reads were successfully mapped to the TM-1 reference genome (Table A2).
At 1, 6, and 24 h post-salt stress treatment, a total of 7701 (3916 up-regulated and 3785 down-regulated), 9269 (4559 up-regulated and 4710 down-regulated), and 11,646 (5407 up-regulated and 6239 down-regulated) differentially expressed genes (DEGs) were identified between the stress and control groups, respectively (Figure 8a). Venn diagram analysis revealed overlaps between time points: 2712 DEGs were shared between 1 h and 6 h, 2785 between 1 h and 24 h, and 3725 between 6 h and 24 h. Notably, 1346 DEGs were common to all three time points (Figure 8b).
2.5. KEGG and GO Enrichment Analysis
To monitor the expression of salt tolerance-related genes, KEGG and GO enrichment analyses were performed on the 1346 common DEGs. KEGG analysis indicated that these DEGs were enriched in 94 pathways. Figure 9a displays the top 20 significantly enriched pathways (p < 0.05). Most DEGs were involved in biosynthesis and metabolism, such as “Phenylpropanoid biosynthesis” (ghi00940, 24 DEGs), “Starch and sucrose metabolism” (ghi00500, 11 DEGs), “Pyruvate metabolism” (ghi00620, 11 DEGs), “Cysteine and methionine metabolism” (ghi00270, 9 DEGs), and “Biosynthesis of various plant secondary metabolites” (ghi00999, 15 DEGs). Additionally, several DEGs were associated with signal transduction, including “MAPK signaling pathway—plant” (ghi04016, 16 DEGs) and “Phosphatidylinositol signaling c” (ghi04070, 6 DEGs).
The Gene Ontology (GO) database classifies gene functions into Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). We observed significant enrichment in 318 BP, 198 CC, and 78 MF terms (p < 0.05). Figure 9b lists the top 10 enriched terms in each category. For BP, the main enrichments were in “photosynthesis” (GO:0015979, 16 DEGs) and “response to oxidative stress” (GO:0006979, 15 DEGs). In CC, most DEGs were linked to “membrane protein complex” (GO:0098796, 19 DEGs) and “photosynthetic membrane” (GO:0034357, 17 DEGs). For MF, the highest numbers (16 DEGs each) appeared in “peroxidase activity” (GO:0004601), “antioxidant activity” (GO:0016209), “electron transfer activity” (GO:0009055), and “oxidoreductase activity, acting on peroxide as acceptor” (GO:0016684).
2.6. Screening Candidate Genes by Integrating GWAS and RNA-seq
To further screen for candidate genes associated with salt tolerance, we integrated the GWAS and RNA-seq results. Among the 73 candidate genes identified by GWAS, three overlapped with the common DEGs detected across all three sampling time points in the RNA-seq analysis (Table 3). Expression profiling revealed that these three candidates (GH_D06G0273, GH_D06G0283, and GH_A10G2036) exhibited significant differential expression (based on FPKM values) in all comparison groups (S1 vs. C1, S6 vs. C6, and S24 vs. C24) (Figure 9c). Consequently, these three genes were identified as prioritized candidate genes regulating salt tolerance during cotton germination within the specific context of the CN4 root transcriptome, precluding direct extrapolation to population-wide expression patterns.
3. Discussion
3.1. Optimization of Salt Stress Evaluation and Phenotypic Variation
Salt stress is a primary abiotic constraint limiting crop productivity, affecting over 1 billion hectares globally, with soil salinization in China showing an expanding trend [38]. Establishing an optimal stress intensity is the prerequisite for accurate germplasm evaluation. In this study, we determined that 200 mmol L^−1^ NaCl is the critical threshold for distinguishing salt tolerance in cotton germination. This selection is based on the principle that when the stress concentration ensures an intermediate phenotype (SII ≈ 0.5), the phenotypic variance within the population is often maximized. Such substantial variation is advantageous for GWAS as it facilitates the detection of genetic effects in polygenic traits while avoiding the excessive statistical bias caused by extreme concentrations (complete inhibition or lack of stress). This finding refines previous methodologies; for instance, Dilnur et al. [39] used 150 mmol L^−1^, while Sun et al. [38] found that 1.2% NaCl completely inhibited germination in sensitive accessions, ultimately selecting 1.0%. The variation in optimal concentrations across studies likely shoots from differences in the genetic background of the populations used. However, it must be acknowledged that the genetic networks and survival strategies activated at this semi-lethal dose may differ from plant responses under milder or field-relevant salinity conditions, where adaptive growth rather than immediate survival is the primary focus.
3.2. Identification of Stable QTLs Associated with Salt Tolerance
Seed germination is a critical bottleneck in the plant life cycle, yet most GWASs in cotton have focused on yield or fiber quality rather than early-stage abiotic stress tolerance [38,40]. Filling this gap, we identified 1277 significant SNPs and integrated them into 94 QTLs, with 49 designated as stable QTLs. The distribution of these QTLs was uneven, with a high density on chromosomes A03 and A06. While this clustering might suggest the presence of “resistance gene clusters,” such enrichment may also be partially influenced by the underlying genomic architecture, including local linkage disequilibrium (LD) decay patterns, marker density distribution, or the specific genetic composition of the natural population. Therefore, these regions should be viewed as high-priority genomic “hotspots” rather than definitive evidence of adaptive selection alone. Discrepancies in specific loci positions compared to other studies [41] suggest that salt tolerance is governed by distinct genetic networks that vary with germplasm diversity and the specific salt tolerance indices used. Future studies employing fine-mapping are required to further distinguish between these genomic structural effects and genuine biological signals.
3.3. Hormonal Regulation and Candidate Gene Functions
The integration of GWASs with transcriptomic data suggests a potential mechanistic understanding of how candidate genes regulate physiological responses. Our study highlights the central role of phytohormones, particularly Auxin and Abscisic Acid (ABA), in mediating salt stress adaptation.
On chromosome A06, the candidate gene GH_A06G0675 was identified as a homolog of YUC2 may be involved in auxin biosynthesis. Auxin is consistent with the alleviation of salt-induced oxidative damage by enhancing antioxidant enzyme activities [42,43,44]. On chromosome A10, GH_A10G2042 encodes a RING-type E3 ligase homologous to Arabidopsis KEG, which is thought to negatively regulate ABA signaling by targeting ABI5 for degradation.
Therefore, these findings suggest a potential regulatory model for hormonal crosstalk during seed germination. Under salt stress, ABA accumulates as a core inhibitory signal, activating the downstream transcription factor ABI5 to maintain dormancy. Concurrently, YUC2-mediated auxin biosynthesis is initiated in active meristems such as the radicle, indirectly modulating ABA responses via ARF10/ARF16 [45]. KEG, acting as an E3 ubiquitin ligase, negatively regulates ABA signaling intensity by targeting ABI5 for degradation, thereby preventing excessive growth inhibition [46,47]. In this model, YUC2 and KEG synergistically regulate the balance of seed ABA responses through two dimensions: auxin synthesis flux and ABA signal attenuation rate. When environmental conditions become permissive, the reduction in ABA signaling and auxin-promoted radicle elongation jointly drive seed germination.
However, it is critical to state that these hormonal involvements are inferred solely from gene annotations and expression profiles. Because this study did not perform direct hormone measurements, signaling assays, or mutant analyses, these pathways should be regarded as proposed hypotheses rather than resolved mechanisms. Salt tolerance during germination likely involves complex multi-hormonal crosstalk that is currently beyond the resolution of this specific dataset.
3.4. Precision Mining of Key Regulators via Multi-Omics Integration
By using the intersection of GWAS candidates and RNA-seq DEGs as a prioritization filter, we identified three high-confidence candidate genes: GH_D06G0273, GH_A10G2036, and GH_D06G0283. This filtering approach relied on the consistency across independent genomic and transcriptomic datasets to prioritize targets for future functional validation, rather than providing direct evidence of causality. GH_D06G0273 (encoding an LRR-RLK) is a prioritized candidate due to its role as a potential cell-surface sensor for stress signals [48]. Similarly, GH_A10G2036 and GH_D06G0283 represent high-confidence nodes in transcriptional reprogramming and protein turnover. Given that the 73 candidate genes were deterministically prioritized within highly specific local LD blocks (e.g., a restricted 200 kb interval for qSSL-Gh6.3) rather than randomly sampled, formal genome-wide enrichment probability models (such as the hypergeometric test) were not applicable. Instead, this transcriptomic integration served exclusively as a strict biological filter to resolve multi-gene co-segregation within these physically linked loci.
3.5. Study Limitations and Perspectives
This study has several limitations that delimit the scope of our conclusions. First, the RNA-seq analysis was performed using only a single salt-tolerant genotype (CN4) and focused exclusively on root tissue; thus, the identified expression patterns may not be universal across the entire population or across different tissues like leaves. Second, the identification of candidate genes relies heavily on Arabidopsis-based functional annotations, which may not perfectly overlap with cotton-specific gene functions. Third, while the effective SNP threshold used in GWAS strictly controls for false positives, it is highly conservative and likely overlooks numerous genuine but low-effect loci due to the assumptions of the Bonferroni correction in the presence of LD.
These candidate genes provide a foundation for future functional validation and targeted genome editing approaches (such as CRISPR/Cas9) aimed at improving salt tolerance in cotton. However, these loci should be treated as candidates for further research rather than ready-to-deploy targets for immediate breeding applications. Future work involving functional assays and multi-environment field trials is essential to confirm the stability and effect size of these candidates.
4. Materials and Methods
4.1. Materials
Three hundred cotton germplasm accessions were sourced from the China Germplasm Resource Bank, encompassing a wide range of ecological regions within China. The collection is widely distributed, comprising 99 accessions from the Yellow River cotton-growing region, 42 from the Yangtze River cotton-growing region, 89 from the inland northwest cotton-growing region, 43 from the ultra-early-maturing cotton-growing region of Liaoning, and 27 foreign germplasm accessions. These accessions cover a broad spectrum of genetic backgrounds and maturity types (early, mid, and late), providing diverse allelic variation for the association study.
4.2. Experimental Design
A subset of sixteen accessions was randomly selected from the 300-accession panel for a pilot study under four NaCl concentrations: 0, 100, 150, and 200 mmol L^−1^. The Salt Injury Index (SII) approached 0.5 at 200 mmol L^−1^ NaCl, which was subsequently chosen as the optimal concentration for large-scale evaluation. Seeds treated with sterile water (0 mmol L^−1^ NaCl) served as the control group. For each accession, thirty-six healthy seeds (twelve seeds per replicate, three replicates) with black seed coats were selected. Seeds underwent surface sterilization with 75% ethanol for one minute, followed by soaking in 4% sodium hypochlorite for eight minutes, and were rinsed five times with sterile water. The seeds were wrapped in two layers of filter paper saturated with the treatment solution and incubated in a growth chamber at 25 ± 1 °C and 65% relative humidity in darkness. Germination was recorded when the radicle length exceeded half the seed length. Seven days after germination, six traits were measured: germination rate (GR), root length (RL), shoot length (SL), root fresh weight (RFW), shoot fresh weight (SFW), and total fresh weight (TFW). In this study, the individual Petri dish served as the fundamental experimental unit to avoid pseudoreplication. For each biological replicate (i.e., one Petri dish), phenotypic measurements were obtained by averaging the results of the 12 seeds after excluding the maximum and minimum values to mitigate outliers. The mean value across the three biological replicates was subsequently calculated to represent the final phenotypic value for each accession, and these accession-level collective means were utilized as the direct input values for the downstream GWAS model. The salt tolerance index (STI) and salt injury index (SII) were calculated as described below:
To ensure a thorough genetic dissection, GWAS was performed on both the raw traits under stress and their corresponding salt tolerance indices, as the latter accounts for inherent vigor differences across accessions under optimal conditions. Furthermore, a membership function method was employed to integrate these diverse traits into a single comprehensive score, allowing for a holistic assessment of salt tolerance.
The rationale for selecting 200 mmol L^−1^ NaCl for the main experiment was based on the need to maximize phenotypic variance within the diverse germplasm panel. A semi-lethal stress level, indicated by an SII of approximately 0.5, ensures that neither the resistant nor the susceptible accessions are severely limited in their physiological response, thus maintaining a wide range of phenotypic variation. This wide distribution of traits is critical for GWASs, as it increases the likelihood of detecting significant association signals related to salt tolerance mechanisms.
4.3. Data Processing
Data processing was conducted using Microsoft Excel 2016. Descriptive statistical analyses of phenotypic traits were carried out with SPSS Statistics 25. Frequency distribution histograms were generated in Origin 2022 to visually assess data distribution.
4.4. DNA Extraction and High-Throughput SNP Genotyping
Genomic DNA was extracted from fresh seedling leaves using the CWE9600 Magbead Blood DNA Kit (Cat. No. CWE9600, Kangwei Century, Beijing, China). Sequencing libraries were constructed through random fragmentation (300–350 bp), end-repair, PolyA tailing, and adapter ligation, followed by PCR amplification and purification. Libraries were sequenced on the DNBSEQ-T7 platform (BGI, Shenzhen, China) to generate 150 bp paired-end reads. Raw data were filtered using fastp [49] to remove adapters and low-quality reads (reads with >1% unknown bases or >50% bases with Q ≤ 5). Clean reads were aligned to the Gossypium hirsutum TM-1 reference genome (v2.1) using BWA-MEM (v0.7.17) [50]. Sorting and indexing were performed using Samtools (v1.7) [51]. PCR duplicates were marked using GATK MarkDuplicates. SNP calling was performed using the GATK (v4.1.8.0) HaplotypeCaller module [52]. The resulting VCF files were filtered using the VariantFiltration module with stringent criteria: individual missing rate ≤ 1%, SNP missing rate ≤ 1%, and minor allele frequency (MAF) > 0.05. Finally, 3,055,642 high-quality SNPs were retained for Principal Component Analysis (PCA), phylogenetic tree construction, population structure analysis, and GWASs.
4.5. Population Structure and LD Analysis
Genome-wide phylogenetic relationships were elucidated by constructing a Neighbor-Joining (NJ) tree using Tassel. Population genetic structure was assessed with Admixture software (v1.3.0), employing K values from 2 to 10 to investigate potential subdivisions within the panel, ranging from broader ecological groupings to finer sub-populations. The optimal number of clusters (K) was determined by identifying the value that minimized the cross-validation (CV) error. The results indicated that K = 6 provided the best fit for our data. Principal component analysis (PCA) was conducted using GCTA software (v1.94.1) to validate the ADMIXTURE results. Linkage disequilibrium (LD) decay was estimated by calculating the squared correlation coefficient (r^2^) between pairs of high-quality SNPs with PopLDdecay software (v3.43).
4.6. Genome-Wide Association Study
Genome-wide association studies (GWASs) were conducted using GEMMA (v0.94.1) [53] with four models: GLM (Q), GLM (P), MLM (Q+K), and MLM (P+K). The number of independent SNPs was determined using PLINK with the command --indep-pairwise 50 10 0.1, resulting in 182,147 effective SNPs [54]. The significance threshold was set at −log_10_(1/182,147) = 5.26. This Bonferroni-like threshold was utilized to maintain strict control over Type I error (false positives) amid the large volume of genetic markers tested. We acknowledge the inherent trade-off of this conservative approach, which may increase the risk of Type II error (false negatives) for minor-effect QTLs. To balance this, signals consistent across multiple models were also prioritized to complement the stringent statistical threshold. To minimize false-positive associations, loci detected by at least two models, particularly those consistent across both GLM and MLM frameworks, were prioritized for further analysis and candidate gene prediction. Manhattan and Q-Q plots were generated in R (v4.4.1) using the “ggplot2” and “qqman” packages, and LD heatmaps were produced with the “Ldheatmap” package.
4.7. Haplotype Analysis and Candidate Gene Prediction
After GWASs, significant SNPs for each trait were ordered by physical position. Adjacent SNPs were considered part of the same QTL if the physical distance between them was less than the estimated LD decay distance calculated in Section 4.5. For QTLs identified across multiple traits or markers, the SNP with the highest phenotypic variation explained (PVE) was selected for haplotype analysis. The Mann–Whitney U test was used to assess the significance of phenotypic differences between haplotypes. Candidate genes within the QTL interval defined by the LD decay distance were mapped to homologous genes in the Arabidopsis thaliana genome (www.arabidopsis.org, accessed on 20 April 2025) and further prioritized based on their functional annotation related to stress response and expression levels observed in our transcriptomic data.
4.8. Transcriptome Sequencing
Membership function clustering identified Zhongmian No. 4 (CN4), a high-salt-tolerant accession, for RNA sequencing. Seeds were germinated using the paper roll method for five days, then transferred to either 200 mmol L^−1^ NaCl or distilled water. Root tissues were collected at 1, 6, and 24 h post-treatment (labeled S1, S6, S24 for stress; C1, C6, C24 for control) and immediately frozen in liquid nitrogen. Three biological replicates were included for each treatment. While this design allowed for a detailed temporal analysis of the stress response in a representative tolerant genotype, we acknowledge it limits the generalizability of expression-based conclusions across the diverse cotton germplasm panel and other tissues. Library construction and sequencing were performed by Novogene Co., Ltd. (Beijing, China) on the Illumina HiSeq platform. Raw reads were filtered to remove adapters, N-containing reads, and low-quality reads (Qphred ≤ 5 bases accounting for >50% of read length). Clean reads were aligned to the Gossypium hirsutum TM-1 reference genome (v2.1) using HISAT2 [55] and assembled using StringTie [56] to reconstruct the transcriptome. Gene expression levels were estimated as Fragments Per Kilobase of transcript per Million mapped reads (FPKM). Data analysis was conducted using the Novogene NovoMagic platform. Differential expression analysis was performed with DESeq2 [57]. To control for the false discovery rate (FDR) in multiple testing, the p-values were adjusted using the Benjamini–Hochberg (BH) procedure. Differentially expressed genes (DEGs) were explicitly defined as those with an |Fold Change| ≥ 2 and FDR < 0.05. Specifically, DEGs within the previously identified QTL intervals were prioritized as potential candidate genes. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted using the Bioinformatics platform (http://www.bioinformatics.com.cn, accessed on 25 April 2025) to predict gene functions.
5. Conclusions
This study conducted a genome-wide association analysis using six salt tolerance traits and the salt tolerance index of 300 landraces of upland cotton under 200 mmol/L NaCl stress. A total of 1277 significant SNPs were detected and integrated into 94 QTLs, from which 49 stable QTLs were identified across multiple models. By utilizing transcriptomic data as a prioritization filter rather than proof of causality, we identified 73 genes potentially associated with salt tolerance, ultimately prioritizing three high-confidence candidate genes (GH_D06G0273, GH_A10G2036, and GH_D06G0283) for further investigation. It should be noted that these findings are hypothesis-generating and are currently limited by the use of a single genotype for transcriptome analysis and the absence of direct functional validation. Nonetheless, these prioritized candidates provide a valuable genomic resource and a foundation for future functional studies, including marker-assisted selection and precision genome-editing approaches (e.g., CRISPR/Cas) aimed at enhancing salt tolerance. Future research involving experimental validation and multi-environment trials will be essential to confirm these regulatory mechanisms, contributing to the development of climate-resilient cotton varieties.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Hassani A. Azapagic A. Shokri N. Global predictions of primary soil salinization under changing climate in the 21st century Nat. Commun.202112666310.1038/s 41467-021-26907-334795219 PMC 8602669 · doi ↗ · pubmed ↗
- 2Kolb A. Barsch K. Environmental factors and seed abundance influence seedling emergence of a perennial forest herb Acta Oecologica 20103650751310.1016/j.actao.2010.07.003 · doi ↗
- 3Chen X. Yu D. Ning K. Xu H. Bi Y. The Comprehensive Assessment of Saline Resistance in the Reed Germination Period of Alfalfa Germplasms Acta Agrestia Sin.20172511151125
- 4Yin Y. Fan S. Li S. Amombo E. Fu J. Involvement of cell cycle and ion transferring in the salt stress responses of alfalfa varieties at different development stages BMC Plant Biol.20232334310.1186/s 12870-023-04335-337370008 PMC 10294350 · doi ↗ · pubmed ↗
- 5Khoury M.J. Yang Q. The Future of Genetic Studies of Complex Human Diseases: An Epidemiologic Perspective Epidemiology 1998935035410.1097/00001648-199805000-000239583430 · doi ↗ · pubmed ↗
- 6Sun X. Zheng Q. Liu Y. Characteristics of Growth and Ion Absorbability in Different Genotypes of Cotton at Germinating and Seedling Stages under Salt Stresses Cotton Sci.200113134137
- 7Shahid M.A. Sarkhosh A. Khan N. Balal R.M. Ali S. Rossi L. Gómez C. Mattson N. Nasim W. Garcia-Sanchez F. Insights into the Physiological and Biochemical Impacts of Salt Stress on Plant Growth and Development Agronomy 20201093810.3390/agronomy 10070938 · doi ↗
- 8Arif Y. Singh P. Siddiqui H. Bajguz A. Hayat S. Salinity induced physiological and biochemical changes in plants: An omic approach towards salt stress tolerance Plant Physiol. Biochem.2020156647710.1016/j.plaphy.2020.08.04232906023 · doi ↗ · pubmed ↗
