Genome-Wide Evolution and Stress-Responsive Regulation of 2-Oxoglutarate-Dependent Dioxygenases in Gossypium
Mingjv Zhu, Peiyu Li, Yuanlong Wu, Abudukeyoumu Abudurezike, Sijia Liang, Chuanyin Zhu, Yi Zhou, Lin Xu, Zhibo Li, Shihe Jiang, Xinhui Nie, Shuangxia Jin

TL;DR
This study explores how 2OGD genes, which are involved in plant hormone metabolism, evolved and respond to stress in cotton, revealing their roles in growth and stress adaptation.
Contribution
The study provides a comprehensive analysis of the evolution and stress-responsive regulation of GA-related 2OGD genes in cotton.
Findings
583 2OGD genes were identified and grouped into three major classes, including GA-related subgroups.
Salt stress upregulated GhGA2OX1 and GhGA20OX2 while downregulating GhGA3OX1, correlating with reduced GA levels.
GA-related 2OGD genes show conserved motifs and stress-responsive cis-elements in their promoters.
Abstract
Purpose: Gibberellins (GAs) are key phytohormones that regulate plant growth, development, and responses to environmental stress, and their metabolism is mediated by 2-oxoglutarate-dependent dioxygenases (2OGDs). Cotton (Gossypium spp.) is a polyploid crop with a complex genome; however, the evolutionary characteristics and stress-responsive regulation of GA-related 2OGDs remain poorly understood. This study aimed to systematically investigate the evolution, expression patterns, and stress-associated regulation of the cotton 2OGD multigene family, with particular emphasis on GA-related members. Methods: 2OGD genes were identified genome-wide in four Gossypium species and Arabidopsis thaliana. Phylogenetic relationships, gene structures, conserved motifs, cis-acting regulatory elements, and synteny were analyzed. Transcriptomic data from multiple tissues and developmental stages,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10- —National Key Research and Development Program of China
- —“Tianshan Talents” Program for the Top Young Innovative Talents
- —science and technology planning project of Tumushuke city Open Bidding for Selecting the Best Candidates
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPlant biochemistry and biosynthesis · Lipid metabolism and biosynthesis · Plant responses to water stress
1. Introduction
Plant hormones are essential signaling molecules that coordinate growth and development throughout a plant’s life cycle [1]. Among these, gibberellins (GAs) play central roles in biosynthetic networks, mediating responses to environmental cues and regulating processes from seed germination [2] to stem/leaf elongation [3] and reproductive development [4,5]. GAs also enhance plant tolerance to abiotic stresses, including drought, salinity, and extreme temperatures [6]. As key catalytic enzymes, 2-oxoglutarate-dependent dioxygenases (2OGDs) govern GA biosynthesis and catabolism, thereby playing essential roles in plant growth and development [7].
The 2OGD superfamily is among the most functionally diverse groups of enzymes in plants [8]. These enzymes require Fe^2+^ and 2-oxoglutarate as cofactors and catalyze a wide range of oxidative reactions, including hydroxylation, decarboxylation, and ring cleavage [9,10]. A specialized class of 2OGDs encompasses key GA metabolic enzymes, GA 20-oxidases (GA20ox), GA 3-oxidases (GA3ox), and GA 2-oxidases (GA2ox), which collectively regulate GA biosynthesis and inactivation [11,12]. Specifically, GA20ox and GA3ox sequentially catalyze the formation of bioactive GAs (e.g., GA_4_ in Arabidopsis thaliana (L.) Heynh. and GA_1_ in Oryza sativa L.), whereas GA2ox irreversibly inactivates GA molecules through site-specific hydroxylation [13]. This dynamic balance between synthesis and inactivation ensures GA homeostasis [6].
Precise regulation of 2OGDs is essential not only for normal plant development but also for their function as “molecular hubs” that integrate endogenous developmental programs with environmental cues. Conversely, disruption of 2OGD metabolism or signaling can severely compromise crop adaptability and agricultural productivity, a challenge that is particularly pressing given the increasing frequency of abiotic stresses driven by global climate change [14,15,16,17,18]. Understanding the molecular basis of 2OGD-mediated GA metabolism and signaling has therefore become a critical objective for crop improvement and sustainable agriculture [19,20]. Evidence from model systems underscores this importance: loss-of-function mutants of rice OsGA2ox9 exhibit reduced seed dormancy and accelerated germination [21,22], while A. thaliana GA20ox2-1 mutants display enhanced root hair elongation under salt stress [23,24]. These findings highlight the essential role of finely tuned GA metabolism in coordinating plant growth, development, and environmental adaptability.
The 2OGD-mediated GA regulatory network plays a key role in cotton growth, with specific members (e.g., GhGA20ox, GhGA3ox) exhibiting tissue- and stage-specific expression [25,26]. Although genome-wide identification and expression analyses of 2OGD genes have been reported in A. thaliana and related crops, these studies have largely focused on diploid or non-polyploid systems, or on limited GA-related subsets. Consequently, how polyploidization reshapes the evolutionary trajectories, regulatory architecture, and functional specialization of the 2OGD family remains unclear [27,28,29]. Comparative genomic studies have shown that the 2OGD family is highly conserved across species [30]. Building on this evidence, we selected A. thaliana, a model in which 2OGD-mediated biosynthetic and signaling pathways are well characterized and key genes such as GA20ox, GA3ox, and GA2ox have been experimentally validated [31], as a reference for systematic analysis.
Cotton (Gossypium spp.), a globally important fiber crop, as a recent allotetraploid, has undergone extensive homoeologous duplication and subgenome differentiation, processes that profoundly influence hormone networks, provides an excellent system for studying the evolution of hormone pathways under polyploidization [32]. Previous studies in cotton have mostly examined individual GA-related genes or isolated regulatory interactions, lacking a comprehensive, phylogeny-guided framework integrating multigene family expansion, cis-regulatory divergence, and stress-responsive expression across diploid progenitors and tetraploid descendants [33]. Therefore, a systematic, multi-species, evolution-informed analysis of the cotton 2OGD family is essential to understand how GA metabolism has been rewired during polyploid evolution and how this contributes to cotton-specific developmental traits and stress adaptation [34].
Collectively, based on previous genome-wide studies in model plants and crops, as well as the established importance of 2OGDs in plant metabolism and hormone regulation, we reasoned that the recent polyploid origin of Gossypium may have contributed to the expansion and diversification of the 2OGD gene family. In particular, polyploidization-associated genome duplication and subsequent sequence and regulatory divergence may have influenced the evolutionary retention and expression patterns of GA-related 2OGD genes. However, the extent to which these processes have shaped the cotton 2OGD repertoire has not yet been systematically examined. To address this knowledge gap, we conducted high-resolution comparative genomics and transcriptomics analyses of two tetraploid cultivated cotton species (Gossypium hirsutum L. and Gossypium barbadense L.), their diploid progenitors (Gossypium raimondii L. and Gossypium arboreum L.), and A. thaliana as a reference.
Therefore, the present study aimed to perform a comprehensive genome-wide identification and characterization of 2OGD genes in tetraploid cultivated cotton species and their diploid progenitors. Specifically, we sought to: (i) identify and classify 2OGD family members using comparative genomic and phylogenetic analyses; (ii) examine patterns of gene family expansion, duplication, and retention in the context of polyploid evolution; (iii) analyze promoter cis-regulatory elements to infer potential regulatory diversification; and (iv) investigate tissue-specific and abiotic stress-responsive expression profiles to provide insights into the possible roles of GA-related 2OGDs in cotton growth, development, and stress adaptation. Collectively, this study provides a foundational framework for future functional investigations of the 2OGD multigene family in polyploid crops.
2. Materials and Methods
2.1. Identification of 2OGD Gene Families
To establish a reliable seed dataset for the identification of 2-oxoglutarate-dependent dioxygenase (2OGD) genes in cotton, we first retrieved well-characterized A. thaliana 2OGD members from A. thaliana Information Resource (TAIR, https://www.arabidopsis.org, Accessed: 20 November 2024). In particular, we used A. thaliana GA-related dioxygenase (AT4G25420) as representative seed sequences, since these genes have been functionally validated in gibberellin biosynthesis and catabolism pathways. The corresponding protein sequences (TAIR10 release) were downloaded in FASTA format. A BLASTP (v2.15.0+) [35] search (E-value ≤ 1 × 10^−5^, minimum identity ≥ 30%, alignment coverage ≥ 50%, scoring matrix = BLOSUM62, gap opening penalty = 11, gap extension penalty = 1) was performed against the target species’ genomes to identify potential homologous sequences (parameters: -num_threads 8 -utfmt 6 -evalue 1 × 10^−5^). The alignment results from the BLASTP(v2.15.0+) search were processed using the Solar (v0.9.6, solar.pl -a prot -f axt -t 1 × 10^−5^) software to connect hits corresponding to individual genes. The connected sequences were further analyzed using the Genewise software (v2.4.1, genewise -gff -quiet) for genome-wide homology searches. This step also included gene structure prediction, resulting in a set of candidate genes. The predicted candidate genes were compared with the protein sequences of the target species. Redundant sequences were removed to obtain a non-redundant set of candidate genes. The Pfam (http://pfam.xfam.org/, Accessed: 20 November 2024) database [36] was used to predict structural domains for the candidate multigene family members (Pfam accession PF03171). Only genes containing the characteristic domains of the target gene families were retained. The final set of non-redundant genes with the correct domain features were indicate as members of the 2OGD gene families (Tables S1 and S2). In this study, reference genome data were obtained from the following sources. For the A subgenome, genome assemblies A1_HAU and A2_HAU.2 were retrieved from the CottonGen database, (https://www.cottongen.org/data/download/genome_diploid_A, Accessed: 27 November 2024). The D subgenome assembly, specifically D5_T2T_PKU, was also obtained from CottonGen, (https://www.cottongen.org/data/download/genome_diploid_D5, Accessed: 27 November 2024). For cotton (G. hirsutum L. cv. TM-1), the genome sequences “Ghirsutum_genome_HAU_v1.1.tar.gz” and “Ghirsutum_genome_HAU_v1.0.tar.gz” were downloaded from the Cotton Genomics and Genetic Improvement Group at Huazhong Agricultural University, (https://cotton.hzau.edu.cn/EN/Download.htm, Accessed: 21 November 2024). Similarly, the genome sequence of island cotton (3-79), “Gbarbadense_genome_HAU_v2.0.tar.gz,” was accessed via the same university’s download portal, (https://cotton.hzau.edu.cn/EN/Download.htm, Accessed: 27 November 2024).
2.2. Construction of Phylogenetic Tree of 2OGD Family
The sequences of 2OGD multigene family members corresponding to G. raimondii, G. hirsutum, G. barbadense, G. arboretum and A. thaliana (G. arboreum and G. raimondii as diploids, G. hirsutum and G. barbadense as tetraploids) were generated, respectively. Multi-sequence alignment of all 2OGD protein sequences was carried out using MUSCLE (v3.8.1551) [37] and phylogenetic trees were constructed using the IQ-TREE (v2.2.6) [38] software proximity method (v2, https://github.com/iqtree/iqtree2/, Accessed: 27 November 2024) with the calibration parameter Bootstrap being set to 1000. The software iTOL(v6) [39] was used to modify the evolutionary tree.
The genome assemblies of diploid and tetraploid cotton species were retrieved from publicly available databases. For the A-subgenome species, we downloaded G. arboreum (A1_HAU) and G. arboreum (A2_HAU.2) assemblies from CottonGen (https://www.cottongen.org/data/download/genome_diploid_A, Accessed: 27 November 2024). For the D-subgenome species, the G. raimondii (D5_T2T_PKU) genome was obtained from CottonGen (https://www.cottongen.org/data/download/genome_diploid_D5, Accessed: 27 November 2024). The tetraploid G. hirsutum TM-1 genome (Ghirsutum_genome_HAU_v1.1) and G. barbadense 3-79 genome (Gbarbadense_genome_HAU_v2.0) were downloaded from the Cotton Functional Genomics Database (https://cotton.hzau.edu.cn/EN/Download.htm, Accessed: 27 November 2024). All genome sequences and annotation files were used as references for subsequent homology searches, gene identification, and synteny analyses.
2.3. Structural Characterization of 2OGD Proteins
Conserved motifs in 2OGDs were indicate using MEME (http://meme-suite.org/, Accessed: 22 December 2024) [40] with default parameters (maximum number of motifs = 20, minimum motif width = 6, maximum motif width = 50, searching for motifs in each sequence individually). Gene structure diagrams (exon-intron organization) and functional domain architectures were generated by integrating GFF3 annotations from CottonGen [41] and NCBI Batch CD-Search [42]. Tertiary protein structures were predicted using SWISS-MODEL (https://swissmodel.expasy.org/, Accessed: 22 December 2024) with default settings. Upstream open reading frames (uORFs) within 2OGD transcripts were detected via uORFlight (http://www.rnairport.com:443/Tool_uORFFinder.php, Accessed: 22 December 2024) [43], analyzing initiation codon contexts (ICCu and ICCm) for regulatory potential.
2.4. Analysis of Gene Promoters in the Cotton
The elements in the promoter fragments of all genes (3000 bp upstream region of the initiation codon “ATG” of gene) in the 2OGD multigene family were used to perform the cis-acting element analysis with the online program PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/, Accessed: 29 December 2024) [44]. R script was used to visualize the cis-elements of 2OGD multigene family members across different genomes.
2.5. Synteny Analysis in Cotton
To investigate the collinearity and to analyze the syntenic relationship among the four cotton species, the complete genome sequences of these cotton species along with respective genome annotation files were subjected to MCScanX (v1.0) [45]. Syntenic visualization was implemented and plotted using JCVI (v1.5) [46] with parameters of ‘jcvi.compara.synteny’ and ‘jcvi.graphics.synteny’, respectively. Ka/Ks ratios of collinear/paralogous 2OGD gene pairs were calculated using TBtools (version 1.108) with default settings [47]. Protein sequences were aligned by MUSCLE, converted to codon alignments, and Ka/Ks values were estimated with invalid results removed and pairs with Ks > 2 excluded (Table S3).
2.6. RNA-Seq Data Analysis of Different Tissues in G. hirsutum
To clarify the tissue-specific expression patterns of 2OGD genes and their responses to different stresses, publicly available transcriptome data from various tissues of the cotton (G. hirsutum L. cv. TM-1) variety—including roots, stems, leaves, petals, anthers, stigmas, ovules at different developmental stages, and fibers at different maturity stages (SRA: SRR1580635; BioProject PRJNA257154, release date: 7 February 2020)—were analyzed and processed [48]. Normalized expression values quantified as FPKM (Fragments Per Kilobase of transcript per Million mapped reads) were subsequently subjected to hierarchical clustering analysis using TBtools software (version 1.108) to generate tissue-specific heatmaps [47]. Finally, expression values for Class C genes of the 2OGD multigene family were log_2_-transformed [log2(expression + 1)] and visualized as boxplots—showing mean values (marked by black diamonds). Intergroup differences were evaluated via pairwise Wilcoxon tests with Benjamini–Hochberg (BH) correction. Small facets (subplots) correspond to individual tissues, each plotted with an independent y-axis scale. The reference genome of Gossypium used in this study was obtained from the NCBI BioProject (SRA: SRR10282153; PRJNA433615, HAU-AD1_v1.1, release date: 3 December 2018). The corresponding genome assembly and annotation files were downloaded and used for genome-wide identification and characterization of 2OGD genes [41].
2.7. Spatiotemporal and Stress-Responsive Expression Profiling Analysis in G. hirsutum
To investigate the spatiotemporal expression profiles of 2OGD genes in Gossypium species, transcriptomic datasets encompassing diverse tissue types and abiotic stress treatments (including salinity, thermal stress, hypothermia, and polyethylene glycol-induced drought simulation) were retrieved from the cotton (G. hirsutum L. cv. TM-1) Omics Database (COD, http://cotton.zju.edu.cn/, Accessed: 29 December 2024), a comprehensive platform for multi-omics research in cotton [49]. Differential expression analysis and normalization were performed using limma; genes with |log_2_FC| ≥ 1 and false discovery rate (FDR) < 0.05 were highlighted as differentially expressed genes (DEGs) (Tables S4 and S5). Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted using the clusterProfilerR (v4.12) package (Tables S6 and S7). For metabolomic data, the intensity matrix was log_2_-transformed and Pareto-scaled prior to identifying differentially abundant metabolites (DAMs) via fold change and Student’s t-test (adjusted p < 0.05). Integrated analysis involved mapping DEGs and DAMs to KEGG pathways and inferring gene-metabolite interactions using Pearson correlation coefficients. Network visualization was constructed using the igraph and ggraph packages in R. All statistical tests and visualizations were performed in R (version 4.3.1); key packages included ggplot2 (v3.51) (for principal component analysis (PCA), clustering, volcano plots, and heatmaps), ComplexHeatmap (v3.19) (for enhanced heatmap visualization), and Circlize (v0.4.17) (for enrichment bubble plots and multi-omics circular plots).
2.8. Integrated Transcriptomic and Metabolomic Analysis in G. hirsutum
All cotton (G. hirsutum) data (Table S8) used in this study were generated by the authors of reference [50]. Seed imbibition (3% NaCl for 0.5, 1 or 1.5 h; control in water), greenhouse pot culture in Fukang-saline soil [50] (Soil samples were collected from the top 0–20 cm of a saline–alkaline field located in Fukang City, Xinjiang, China (44°22′ N, 87°56′ E; elevation 460 m a.s.l, EC 4.09 dS m^−1^, pH 8.03) in September 2021, sampling of 144 seedlings (12 per replicate, 3 replicates × 4 treatments) at 25 day after planting (DAP), separation of shoot and root tissues, snap-freezing in liquid N_2_ and storage at −80 °C, construction of 12 Illumina HiSeq 2000 RNA-seq libraries (San Diego, CA, USA), and UPLC-MS/MS metabolite profiling (100 mg lyophilised powder per sample) were conducted by the original team; we performed only in silico re-analysis of the published datasets. Data processing was performed as described in Section 2.6.
2.9. Quantitative Real-Time PCR (qRT-PCR) Experiment Under Salt Stress
To elucidate the expression patterns of 2OGD genes under salt stress, leaf samples were collected from the cotton (G. hirsutum L. cv. TM-1) cultivar subjected to control or salt stress treatments. The salt stress treatment protocol was consistent with that described in reference [50]. Briefly, cotton seeds were imbibed in 3% (w/v) NaCl solution for 0.5, 1, or 1.5 h, while control seeds were imbibed in distilled water. Seedlings were subsequently grown in pots containing Fukang saline soil (electrical conductivity, EC = 4.09 dS m^−1^; pH = 8.03) under greenhouse conditions. Leaf tissues were harvested at 25 DAP from salt-treated and control plants, immediately frozen in liquid nitrogen, and stored at −80 °C until further analysis. Total RNA (4.0 μL per system) was extracted using TRIzol reagent (Thermo Fisher Scientific, Waltham, Massachusetts, USA). Each 20 μL reaction sample was run on a Bio-Rad IQ2 (Bio-Rad Laboratories, Hercules, California, USA) sequence detection system with Applied Biosystems software. RNA quality and concentration were assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA) and agarose gel electrophoresis. First-strand cDNA (2.0 μL per system) was synthesized using the PrimeScript™ RT reagent Kit (TaKaRa Bio, Kusatsu, Shiga, Japan).
Quantitative real-time PCR (qRT-PCR) was performed using ChamQ Universal SYBR qPCR Master Mix (Vazyme Biotech, Nanjing, Jiangsu, China) on an Applied Biosystems 7500 Real-Time PCR System (Thermo Fisher Scientific, Waltham, Massachusetts, USA). Each experiment included four biological replicates (independent plants) and two technical replicates (independent PCR reactions). Primer sequences used for qRT-PCR are listed in Table S9. Relative expression levels of 2OGD genes were calculated using the comparative Ct method (Table S10) [51], with UBQ7 (GenBank accession no. DQ116441) used as the internal reference gene [52].
2.10. Forecasting GA-Associated Downstream Pathway Proteins
The STRING [53] database (https://string-db.org/, Accessed: 25 September 2025) was used to find downstream pathway proteins linked to GA-related members of the 2OGD multigene family, with A. thaliana as the reference species. We used high-confidence protein–protein interactions to find possible targets that the GA-associated 2OGDs might control. The predicted interaction network was then imported into Cytoscape (v3.10.2) for visualization, which made it possible to build a complete network that shows how 2OGD genes and their downstream pathway proteins are related [54].
2.11. Hormone Extraction and Determination
The experiment was conducted in greenhouse pots. For seed pretreatment, cotton seeds were soaked in a 3% sodium chloride solution at room temperature for 0.5 h (N0.5), 1 h (N1), and 1.5 h (N1.5), with three biological replicates for each treatment. Hormone extraction was performed with appropriate modifications based on the methods described by Yang et al. [55]. Approximately 0.5 g of cotton seed samples was fully ground in liquid nitrogen, followed by the addition of 2 mL of 80% (v/v) methanol solution containing 1 mM butylated hydroxytoluene (BHT). The mixture was thoroughly vortexed to prevent oxidation. Subsequently, the mixture was centrifuged at 5000 rpm at 4 °C for 5 min, and the supernatant was transferred to a C18 solid-phase extraction (SPE) column for purification. The extraction process was repeated three times, and all filtrates were collected. The combined filtrates were transferred to a 5 mL EP tube, and methanol in the samples was completely removed using a dry nitrogen evaporator. The residue was dissolved and brought to a constant volume at a ratio of fresh weight to diluent of 1:3 (w/v). Hormone content was determined using the enzyme-linked immunosorbent assay (ELISA) method, following the instructions provided with the kit (specific manufacturer can be specified, e.g., Shanghai Enzyme-Linked Biotechnology Co., Ltd., Shanghai, China). The absorbance was measured at a wavelength of 492 nm using a microplate reader, and the hormone content of each treated sample was calculated based on the standard curve (Table S11).
3. Results
3.1. Identification and Chromosomal Distribution of 2OGD Genes in Cotton
To comprehensively identify members of the 2OGD gene superfamily in cotton, we performed BLAST (v2.15.0+) and Genewise searches, followed by validation using the Pfam [36] and NR databases [56]. In total, 583 2OGD genes were identified across four Gossypium species and A. thaliana. Specifically, 104, 189, 183 and 52 genes were detected in G. arboreum, G. barbadense, G. hirsutum, and G. raimondii, respectively, whereas 55 genes were found in A. thaliana (Figure 1; Table S1). The accuracy of these annotations was further confirmed using the NCBI Batch CD-Search tool [57].
Chromosomal mapping results showed that these 2OGD genes are located on all 26 chromosomes in the cotton genome (Figures S1–S4). That these tetraploid cotton genomes have more than twice the amount of 2OGD genes compared to their corresponding diploid parental lines indicated that a massive number of new gene copies have been introduced via gene duplication during the course of evolution of cultivated cotton.
3.2. Systematic Evolution and Functional Diversification of the 2OGD Superfamily in Cotton
To elucidate the phylogenetic and genetic relationships among 2OGDs from four Gossypium species (G. arboreum, G. barbadense, G. hirsutum, and G. raimondii) and A. thaliana, we first performed multiple sequence alignment (MSA) using MUSCLE, then constructed a maximum likelihood (ML) tree with IQ-TREE2 software using BNNI optimization. Based on this analysis, we partitioned the 2OGD multigene family across these five genomes into three major Classes (A, B, C), which reflect their distinct biological functions (Figure 2).
Class A primarily contains cotton flowering genes, ABC transporter family genes, and others, with a total of 218 members. Class B mainly includes 197 genes such as flavonoid/alkaloid biosynthesis genes, secondary metabolism genes, and phytohormone modification/response pathway genes. Class C consists of 168 GA-related dioxygenases, including 32 genes encoding GA2ox, GA20ox, and GA3ox (Table S2)—these enzymes are critical for GA biosynthesis and deactivation.
3.3. Analysis of Conserved Motifs in Cotton 2OGD Superfamily Members
Comprehensive structural analyses of 2OGD genes have made it clear how different species are related to each other and shown that protein motifs are very well preserved. Over 90% of cotton genes have motifs 1–6 (Figure 3A,B), which shows how important they are in terms of evolution.
There are different classes of conservation in different classes. Class A is the most conserved, Class C is next, and Class B is the least (Figure 3C,D). Most motifs in G. hirsutum are more stable than those in other cotton species, but Motif 8 is found in 67% of genes (Figure S9). In G. hirsutum, motifs from Classes B and C are more stable than those from Class A, with Class C being the most stable.
3.4. Conserved Architecture and Diversified Expansion Suggest Patterns of 2OGD Promoter Evolution
To investigate the regulatory architecture of metabolism-related 2OGD genes, we analyzed the 3 kb upstream promoter regions in four Gossypium species (G. arboreum, G. barbadense, G. hirsutum, G. raimondii) and A. thaliana. The thermal map analysis of cis elements (Figure 4A) shows that there is significant conservation among orthologous genes of the 2OGD multigene family among cis element species. Classification results of cis-elements showed that hormone-responsive elements (HREs)—including motifs responsive to abscisic acid, auxin, salicylic acid, and gibberellin (GA)—are widely distributed, while their abundance differs between cotton and A. thaliana. Specifically, the abundance in G. hirsutum and G. barbadense is approximately threefold that in A. thaliana (Figure 4B). Growth-related elements (GREs), which are associated with meristem activity and cell elongation, exhibit species-specific enrichment. The abundance of GREs in G. hirsutum and G. barbadense is roughly twofold that in A. thaliana (Figure 4C). Stress-responsive elements (SREs) participate in responses to drought, low temperature, and oxidative stress, and are extensively present in cotton. Their expression varies across species: the abundance in G. hirsutum and G. barbadense is about twice that in other cotton species, and over threefold that in A. thaliana (Figure 4D).
3.5. Gene Collinearity Analysis of the Cotton 2OGD Superfamily
To elucidate the evolutionary relationships and synteny conservation within the 2OGD multigene family across allotetraploid cotton species, we performed comparative genomic analyses using MCScanX (v1.0) [45] and JCVI (v1.5) [58] algorithms, encompassing G. arboreum, G. barbadense, G. hirsutum, and G. raimondii. These analyses revealed pronounced synteny, particularly between the allotetraploids G. barbadense and G. hirsutum, with 498 orthologous gene pairs indicate (Figure 5). Significant synteny was also observed between the diploid progenitor G. arboreum and the derived allotetraploid G. hirsutum (282 collinear pairs), as well as between G. barbadense and G. raimondii (133 pairs).
Examining 2OGD gene expansion dynamics, we observed an approximately 1.65-fold duplication rate in G. hirsutum relative to G. arboreum, where 94 ancestral genes expanded to 155 paralogs. A representative example is the Garb_08G026950 locus, which expanded into six orthologs in G. hirsutum. Chromosomal mapping confirmed conserved synteny between homologous chromosomes Ghir_A07 and Ghir_D07. Comparative analysis further indicate 180 orthologous gene pairs between G. barbadense and G. hirsutum, including complex duplication patterns, such as one G. hirsutum gene aligning to four G. barbadense orthologs. Clear orthology was evident between Gbar_A07 and Ghir_A07, and between Gbar_D07 and Ghir_D07. Additionally, Ka/Ks analysis of homologous gene pairs between A. thaliana and G. hirsutum revealed that most 2OGD genes have undergone intense purifying selection (Table S3).
3.6. Spatial and Temporal Expression Profiling of Cotton 2OGD Superfamily Members
To investigate the functional diversification and spatiotemporal expression characteristics of the 2OGD multigene family in cotton, Pearson correlation analysis was performed using RNA-seq data from various tissues, including roots, stems, leaves, petals, anthers, stigmas, ovules, and fibers (Figure 6A, Table S4). The results showed that the 2OGD family genes exhibited significant tissue specificity. The expression profiles of vegetative tissues (roots, stems, and leaves) were distinct from those of reproductive and developmental tissues (petals, anthers, stigmas, ovules, and fibers). Moreover, members of the 2OGD family displayed distinct tissue-specific expression patterns (Figure 6B–D). In Class A genes, Ghir_D12G022320 and other genes were highly expressed in vegetative organs but showed lower expression in reproductive organs, exhibiting a typical vegetative tissue-specific pattern. In contrast, genes such as Ghir_D05G017800 were significantly upregulated in reproductive organs. In Class B genes, Ghir_D12G025950 was highly expressed in vegetative tissues, whereas Ghir_D07G022890 showed higher expression in reproductive organs, displaying an opposite expression trend. Class C genes such as Ghir_D10G007720 and Ghir_A07G004560 were highly expressed in vegetative organs (roots, stems, and leaves), whereas Ghir_D01G003080 and Ghir_A07G015460 showed higher expression in reproductive organs (petals, anthers, stigmas, ovules, and fibers).
Additionally, we selected Class C genes associated with GA biosynthesis for focused study. Expression values were log_2_-transformed for visualization, and boxplots (Figure 7) revealed that these Class C genes—known to function as dynamic homeostatic regulators and participate in plant defense responses [59]—exhibited higher expression in roots, stems, and leaves relative to Class A and B genes, underscoring their central role in maintaining growth homeostasis [60,61].
3.7. Expression Dynamics and Functional Enrichment of 2OGD Genes Under Abiotic Stress
To investigate the stress-responsive expression patterns of 2OGD genes, we analyzed RNA-seq sequencing data of cotton plants under control (CK), high temperature (HT), low temperature (LT), salt (NaCl), and drought (polyethylene glycol, PEG) treatments. A heatmap of the expression levels of 52 2OGD genes revealed that different genes exhibited stress-specific transcriptional patterns (Figure 8, Table S5), indicating differential regulation of these genes under various abiotic stresses.
We further focused on Class C 2OGD genes (GA-related) and analyzed previously published transcriptomic and metabolomic data from control and NaCl-treated groups [50]. The original author soaked cotton seeds in a 3% sodium chloride solution at room temperature for 0.5 h (N0.5), 1 h (N1), and 1.5 h (N1.5), with three biological replicates for each treatment, generating transcriptomic and metabolomic data of cotton seeds under salt stress.
Further analysis revealed that the expressions of GhGA2OX1 (Ghir_D01G003720) and GhGA20OX2 (Ghir_D09G000450) were significantly upregulated under salt stress, while the expression of GhGA3OX1 (Ghir_D06G021240) was downregulated. These genes showed obvious differential expression (Figure 9A), suggesting that they may be involved in the salt stress response of cotton. To further explore the potential associations between genes of the 2OGD family and metabolites, a series of compounds related to GA metabolism (Tables S6 and S7) or signaling pathways were identified through KEGG analysis and literature review. These compounds include Mevalonic acid, Naringenin, Quercetin, Kaempferol, Isorhamnetin, Apigenin, Hesperetin, Tricin, D-Glucose, Glucose-1-phosphate, Glucose-6-phosphate (kz001131), Maltotetraose, UDP-glucose, Caffeic acid (kz000504), Ferulic acid (kz000511), Sinapic acid (kz000520), Procyanidins/Epicatechin (kz000972)/Dihydroquercetin, Salicylic acid, Indole-3-acetonitrile (3-IAN), Glutaric acid (kz000293), α-Ketoglutaric acid (kz000312), Phloretin (kz000417), Ribitol (kz001126), and D-Arabitol (kz001127).
Their specific functions are as follows: Mevalonic acid serves as a precursor in the upstream isoprene pathway of GA biosynthesis [62]; Flavonoids (such as Naringenin, Quercetin, Kaempferol, Isorhamnetin, Apigenin, Hesperetin, Tricin, Procyanidins, Epicatechin, Dihydroquercetin, and Phloretin) can affect GA responses by regulating GA signaling or interacting with DELLA proteins [63]; Carbohydrate compounds (D-Glucose, Glucose-1-phosphate, Glucose-6-phosphate, UDP-glucose, and Maltotetraose) are closely related to GA-mediated energy metabolism and seed germination [64]; Phenolic acids (Caffeic acid, Ferulic acid, and Sinapic acid) can regulate hormone balance, including GA synthesis and degradation [65]; Salicylic acid and Indole-3-acetonitrile exhibit antagonistic or interactive effects on GA signaling, respectively [66]; Carbon metabolic intermediates (Glutaric acid, α-Ketoglutaric acid, Ribitol, and D-Arabitol) are involved in GA-induced carbon flux redistribution [67,68]. Using transcriptomic and metabolomic data of Class C 2OGD genes (Table S8), the analysis revealed that some metabolites exhibited specific correlations with key genes involved in gibberellin (GA) metabolism (Figure 9B). Among these metabolites: Procyanidin C2, Dihydroquercetin (Taxifolin), Epicatechin, Naringenin 7-O-glucoside (Prunin), Procyanidin C1, and Procyanidin B4 all showed a positive correlation with GhGA2OX1 and GhGA20OX2, but had little or no correlation, or even a negative correlation, with GhGA3OX1. Uridine 5′-diphospho-D-glucose was positively correlated with GhGA20OX2, negatively correlated with GhGA3OX1, and showed no obvious correlation with GhGA2OX1. Additionally, GhGA3OX1 and GhGA20OX2 were negatively correlated with (R)-Mevalonic acid, while GhGA2OX1 had no significant correlation with this metabolite.
Subsequently, qRT-PCR validation showed that GhGA2OX1 and GhGA20OX2 were significantly upregulated under salt stress, while GhGA3OX1 was significantly downregulated (p < 0.001, Figure 9C, Table S10), which was consistent with the results of transcriptomic analysis. Furthermore, the subsequent determination of GA metabolite content (Figure 9D, Table S11) revealed that cotton GA exhibited an overall inhibitory trend under salt stress, and its growth rate was significantly lower than that of the control group. This is consistent with the expression pattern of differentially expressed genes in the transcriptome [69,70]. Subsequently, based on the high confidence data (score > 0.7) from STRING [53] database (Figure 10), PPI network analysis was performed to predict GA synthesis and metabolism. DELLA proteins (RGL3, RGL1, GAI, RGA, and RGL2) and GA3ox2, GA3ox2, and GA20ox1 proteins were particularly prominent in the network [71].
4. Discussion
Gibberellins (GAs) are diterpenoid phytohormones that regulate diverse developmental processes including seed germination, stem elongation, reproductive transition, and stress responses [72,73]. Their biosynthesis begins in plastids, where geranylgeranyl diphosphate is converted to ent-kaurene, and continues in the endoplasmic reticulum to produce GA_12_, the first stable intermediate [14,74]. Class C 2OGD enzymes, such as GA20ox and GA3ox, subsequently generate bioactive GAs, while GA2ox inactivates these molecules to maintain hormonal homeostasis [16,75,76]. This tightly regulated network ensures spatiotemporal precision in GA levels, enabling plants to balance growth promotion and stress tolerance [70,77].
Across four Gossypium species and A. thaliana, a total of 583 2OGD genes were identified (Figure 1). The higher gene number in tetraploid species—approximately 1.6 times that of diploids—may be attributable to extensive duplication events. Such an expansion could indicate that polyploidy serves as a potential driver of functional diversification and regulatory complexity in plants [78,79]. However, the extent of expansion varied among subfamilies (Figure 2), suggesting that specific enzyme classes involved in GA metabolism and stress responses may have undergone asymmetric evolutionary patterns. Structural analysis indicated that motifs 1–6 are highly conserved, while exon–intron architectures tend to be consistent within each Class but divergent across Classes (Figure 3). Promoter analysis revealed a significant enrichment of cis-acting elements related to hormone regulation, development, and stress responses in tetraploid cotton compared with its diploid progenitors (Figure 4). This may suggest that both gene duplication and promoter diversification contribute to enhanced environmental responsiveness in tetraploid cotton [80,81,82,83,84]. The strong collinearity observed between G. barbadense and G. hirsutum (498 orthologous gene pairs) further suggests a shared evolutionary trajectory in allotetraploid cotton (Figure 5), implying that whole-genome duplication may have facilitated gene retention and functional diversification in hormone-related pathways [32]. The spatiotemporal expression analysis (Figure 6) showed relatively high expression levels of Class C genes (Figure 7) in roots, stems, and leaves, suggesting that these genes may have undergone functional specialization within the 2OGD family.
Promoter analysis indicated an enrichment of cis-acting regulatory elements related to hormone signaling, development, and stress responsiveness in tetraploid cotton relative to its diploid progenitors (Figure 4). The presence of hormone-responsive motifs, including GA- and ABA-related elements, together with stress-associated MYB/MYC binding sites, suggests that promoter diversification following polyploidization may have contributed to increased regulatory flexibility of 2OGD genes [85,86]. In particular, the accumulation of GA-responsive cis-elements in Class C gene promoters may be associated with enhanced transcriptional responsiveness to hormonal dynamics under stress conditions [87]. The coexistence of hormone- and stress-related regulatory elements within individual promoters further implies potential regulatory cross-talk between GA signaling and stress-response pathways [88]. Collectively, these findings suggest that the combined effects of gene duplication and cis-regulatory variation may have played a role in shaping the functional diversification of the 2OGD gene family in tetraploid cotton.
The stress-response analysis (Figure 8) revealed that GA regulation may play dynamic and context-dependent roles under abiotic stress conditions. RNA-seq results further suggest that specific GA-related genes could be rapidly responsive to heat, cold, salt, and drought, with expression patterns varying between activation and repression. Among them, the GA related Class C 2OGD genes GhGA3OX1, GhGA20OX2, and GhGA2OX1 showed significant differential expression trends under salt stress (Figure 9): GhGA2OX1 and GhGA20OX2 were significantly upregulated, while GhGA3OX1 was downregulated.
The complex correlation pattern observed between various flavonoids and sugar metabolites and key genes related to gibberellin (GA) metabolism suggests that under salt stress, there may be a certain connection between the fine regulation of GA biosynthesis/inactivation pathways and secondary metabolism in plants, in order to achieve a dynamic balance between growth and stress adaptation [63]. The negative correlation between (R)-mevalonic acid (a precursor in GA biosynthesis) and GhGA3OX1/GhGA20OX2 may reflect a feedback regulatory mechanism within the GA biosynthetic pathway, which is an interesting aspect worthy of further investigation [75,89]. Existing research results have shown that plant gibberellin (GA) biosynthesis and metabolism genes are involved in plant resistance to salt stress [69,70], and the level of plant endogenous GA exhibits an inhibitory trend. This is consistent with the measured change trend of endogenous gibberellin (GA) content in cotton. The transcriptome still contains downregulated GhGA3OX1 [69], and upregulated GhGA2OX1 [90] and GhGA20OX2 genes [91]. This not only reflects the complexity of the metabolite-gene regulatory network, but also provides new targets for the genetic improvement of cotton.
The PPI network (Figure 10) uncovered core nodes governing gibberellin (GA) metabolism and signaling: GA3ox1, GA20ox1 and GhGA3ox2—key biosynthetic enzymes—form a dense interaction module with DELLA proteins (RGL3, RGL1, GAI, RGA and RGL2). These interactions, predicted from high-confidence STRING scores, remain hypotheses that require validation by co-immunoprecipitation or yeast two-hybrid assays to confirm their roles in maintaining endogenous GA levels and mediating signal transduction [92].
Based on analyses of expression levels, metabolite correlations, and PPI network features, we have identified genes that may be involved in maintaining GA homeostasis under salt stress. Among them, GhGA2OX1 may enhance salt tolerance by promoting GA inactivation [93], while GhGA3OX1 and GhGA20OX2 may participate in feedback regulation of GA synthesis [94]. In the future, through gene editing or overexpression technology, targeted regulation of cotton molecular breeding can be achieved, which not only provides effective candidate targets but also balances cotton growth and salt tolerance.
Through integrated comparative genomics, transcriptomics, metabolite association, and network analyses, this study provides a systematic overview of the 2OGD multigene family in cotton. We observed that polyploid cotton species harbor an expanded 2OGD repertoire relative to diploid progenitors, consistent with whole-genome duplication–associated gene retention, with GA-related subfamilies showing notable differences in expansion patterns. Conserved protein motifs but divergent gene structures and promoter cis-elements suggest potential regulatory divergence following polyploidization. Expression analyses further indicate that GA-related 2OGD genes display tissue- and stress-associated expression variation. Under salt stress, GhGA2OX1, GhGA3OX1, and GhGA20OX2 showed pronounced expression changes and metabolite associations, although causal relationships remain to be established. Overall, these results provide correlative evidence that polyploidy may have influenced the evolution and regulation of GA-related 2OGD genes in cotton, establishing a foundation for future functional studies.
Despite these advances, several limitations remain. First, the identification and annotation of the 583 2OGD genes relied primarily on computational and transcriptomic evidence, with limited functional validation. Future studies should employ CRISPR/Cas9-mediated knockout or overexpression approaches in cotton to directly assess phenotypic consequences. Second, this study focused predominantly on transcriptional regulation, giving limited attention to post-translational modifications, which are known to play critical roles in GA signaling [18,95]. Finally, the analysis emphasized cultivated cotton, with only limited comparison to wild relatives, thereby constraining insights into evolutionary and domestication effects.
Future research should integrate comparative genomics with functional genetics to fully exploit the potential of 2OGD genes in cotton improvement. Functional dissection of GhGA2OX1, GhGA3OX1, and GhGA20OX2 genes via CRISPR/Cas9 will clarify causal links between gene expansion, regulatory diversification, and stress adaptation. Comparative studies between wild and cultivated cotton could reveal how polyploidy and domestication have shaped GA metabolism. Furthermore, integrating transcriptomics and metabolomics will facilitate construction of a comprehensive GA metabolism-signaling-phenotype response framework. Ultimately, leveraging GA regulatory networks in cotton breeding may provide novel targets for improving plant height, stress tolerance, and fiber quality.
In conclusion, this study elucidates the evolutionary and functional characteristics of the cotton 2OGD superfamily, and highlights the pivotal roles of GA-related genes in growth regulation and stress adaptation. Among them, the specifically expressed GhGA2OX1, GhGA3OX1, and GhGA20OX2 genes provide new targets for cotton genetic improvement. These findings not only deepen our understanding of the evolutionary dynamics of plant hormone metabolism, but also offer valuable strategies for cotton genetic improvement and molecular breeding.
5. Conclusions
In this study, we systematically characterized the evolutionary dynamics and functional specialization of the 2OGD superfamily in cotton, with particular emphasis on GA-related members. Through comparative genomic, structural, and transcriptomic analyses, we demonstrated that polyploidization-driven gene expansion, promoter diversification, and transcriptional plasticity collectively shaped the diversification of cotton 2OGDs. Notably, Class C GA-related dioxygenases displayed organ-specific expression and dynamic regulation under multiple abiotic stresses, indicating their possible involvement in growth regulation and stress responses. Despite the remaining limitations in functional validation, our findings provide an evolutionary framework and candidate genes (GhGA2OX1, GhGA3OX1, and GhGA20OX2) for further exploration. These findings provide preliminary insights that may inform future molecular breeding strategies to improve cotton resilience and fiber quality.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Lo S.F. Yang S.Y. Chen K.T. Hsing Y.I. Zeevaart J.A. Chen L.J. Yu S.M. A novel class of gibberellin 2-oxidases control semidwarfism, tillering, and root development in rice Plant Cell 2008202603261810.1105/tpc.108.06091318952778 PMC 2590730 · doi ↗ · pubmed ↗
- 2Yu Z. Duan X. Luo L. Dai S. Ding Z. Xia G. How Plant Hormones Mediate Salt Stress Responses Trends Plant Sci.2020251117113010.1016/j.tplants.2020.06.00832675014 · doi ↗ · pubmed ↗
- 3Wu W. Zhu L. Wang P. Liao Y. Duan L. Lin K. Chen X. Li L. Xu J. Hu H. Transcriptome-Based Construction of the Gibberellin Metabolism and Signaling Pathways in Eucalyptus grandis × E. urophylla, and Functional Characterization of GA 20ox and GA 2ox in Regulating Plant Development and Abiotic Stress Adaptations Int. J. Mol. Sci.202324705110.3390/ijms 2408705137108215 PMC 10138970 · doi ↗ · pubmed ↗
- 4Shani E. Hedden P. Sun T.P. Highlights in gibberellin research: A tale of the dwarf and the slender Plant Physiol.202419511113410.1093/plphys/kiae 04438290048 PMC 11060689 · doi ↗ · pubmed ↗
- 5Pan C. Tian K. Ban Q. Wang L. Sun Q. He Y. Yang Y. Pan Y. Li Y. Jiang J. Genome-Wide Analysis of the Biosynthesis and Deactivation of Gibberellin-Dioxygenases Gene Family in Camellia sinensis (L.) O. Kuntze Genes 2017823510.3390/genes 809023528925957 PMC 5615368 · doi ↗ · pubmed ↗
- 6Zhang C. Nie X. Kong W. Deng X. Sun T. Liu X. Li Y. Genome-Wide Identification and Evolution Analysis of the Gibberellin Oxidase Gene Family in Six Gramineae Crops Genes 20221386310.3390/genes 1305086335627248 PMC 9141362 · doi ↗ · pubmed ↗
- 7Wang Y. Shi Y. Li K. Yang D. Liu N. Zhang L. Zhao L. Zhang X. Liu Y. Gao L. Roles of the 2-Oxoglutarate-Dependent Dioxygenase Superfamily in the Flavonoid Pathway: A Review of the Functional Diversity of F 3H, FNS I, FLS, and LDOX/ANS Molecules 202126674510.3390/molecules 2621674534771153 PMC 8588099 · doi ↗ · pubmed ↗
- 8Wang H. Li H. Li C. Liu S. Zhang P. The Antarctic moss 2-oxoglutarate/Fe(II)-dependent dioxygenases (Pn 2-ODD 2) enhanced the tolerance to drought and oxidative stress BMC Plant Biol.20252554910.1186/s 12870-025-06578-840295947 PMC 12036268 · doi ↗ · pubmed ↗
