Characterization and Comparative Analysis of the Complete Mitochondrial Genome of a Limestone-Endemic Endangered Plant Species Hemiboea yongfuensis (Gesneriaceae)
Xin-Yue Tao, Xin-Mei Qin, Qiang Zhang, Xiao-Li Yang, Yong-Bin Lu, Yan-Jun Tan, Peng-Wei Li, Xi-Yang Huang, Xiang Gan

TL;DR
This study reports the first complete mitochondrial genome of the endangered plant Hemiboea yongfuensis, revealing insights into its evolution and potential for conservation and breeding.
Contribution
The first complete mitogenome assembly for the genus Hemiboea, providing new data for plant mitochondrial evolution and phylogenetics.
Findings
The mitogenome is 619,997 bp long with 61 genes and a GC content of 43.63%.
Significant transposition of chloroplast sequences to mitochondria and high RNA editing events were identified.
Phylogenetic analysis confirmed the monophyly of Gesneriaceae and other Lamiales families.
Abstract
Background: Hemiboea yongfuensis is a recently discovered critically endangered species. It is exclusive to the limestone regions of Yongfu County, Guilin, Guangxi. Currently, there is a lack of mitogenome data for Hemiboea species, hindering the potential of disclosing the evolutionary processes of the mitochondrial genome, which has been far less assembled and shown to be complex in the plant kingdom. Moreover, it prevents potential applications of mitochondrial genome data in phylogenetics and plant adaption, breeding, and conservation. Results: In order to reveal the mitochondrial features and variations and explore the usefulness of mitochondrial genes in phylogenetics, in this study, we assembled the complete mitogenome of H. yongfuensis using PacBio HiFi long reads, and analyzed its codon usage bias, RNA editing sites, repetitive sequences, sequence lateral transfer, phylogenetic…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10- —National Natural Science Foundation of China
- —Guangxi Natural Science Foundation
- —National Key Research and Development Program of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPlant and Fungal Species Descriptions · Genomics and Phylogenetic Studies · Plant Diversity and Evolution
1. Introduction
Mitochondria, the energy-producing centers of cells, are essential for the growth, development, and reproduction of organisms [1]. Research on plant mitochondria started around 1950, coinciding with the first isolation of mitochondria from animal and plant tissues [2]. Plant mitochondria not only participate in many metabolic processes related to energy production and the synthesis and breakdown of different compounds, but also act as carriers of genetic information through their genomes [3]. The mitochondrial genetic system is relatively independent of the cell nucleus, and its genetic information is often inherited from the maternal parent, which somewhat reduces the difficulty of genetic research [4]. However, assembling mitogenomes remains challenging due to their structural diversity, such as branched linear, single linear, circular, and mixed circular–linear forms [5,6,7]. Traditional short-read sequencing, despite its accuracy, often struggles to resolve these complexities and repetitive sequences, resulting in fragmented assemblies. Long-read sequencing has emerged as a more effective alternative, capable of spanning repetitive regions and accurately determining structural configurations. Additionally, the presence of extensive non-coding sequences, high-frequency repetitions, RNA editing, and plastid genome insertions [8,9,10,11] further complicates the assembly process. Some of these features also lead to substantial differences in the size of plant mitogenomes [12]. To date, there are significantly fewer complete plant mitogenomes reported compared to plant chloroplast genomes [10]. Although plant mitogenomes vary greatly in structure and size, the mitochondrial protein-coding genes tend to have conserved features. Some of these genes have been utilized in phylogenetic studies of plants, showing potential and importance in plant phylogenetics [13,14].
The Gesneriaceae family includes around 160 genera and more than 3800 species, making it one of the larger groups of tropical plants [15]. The family includes herbs, vines, and shrubs, which often have ornamental value because of their attractive flowers. Certain species have been utilized in traditional medicine mainly for addressing issues such as fever, cough, the common cold, snake bites, and pain, along with various infectious and inflammatory conditions [16]. The Gesneriaceae family, a member of the Lamiales order, provides important insights into evolutionary history and mechanisms of Lamiales featured with zygomorphic and bilabiate flowers. However, assembling plant mitogenomes has been a longstanding conundrum due to sequencing technique limitations that the short reads produced by Sanger sequencing or next-generation sequencing are unable to span the long repeats frequently occurring in plant mitogenomes, along with a lack of efficient assembling tools for dealing with such complex plant mitogenomes. In addition, the usefulness of the mitochondrial sequences, for example, in phylogenetics, has not been adequately tested or realized, further preventing the endeavor of studying plant mitogenomes. Hence, despite being a large family, only four species have had their mitogenomes assembled to date: Boea hygrometrica and Haberlea rhodopensis, which are known for extreme desiccation tolerance; Primulina hunanensis, an endangered species adapted to cave environments; and Oreocharis esquirolii, a species with rare actinomorphic corolla in the family [5,17,18,19]. Fortunately, the fast developments of PacBio and Oxford Nanopore long-read sequencing techniques, along with the newly designed assembling tools based on long-read data in the recent past, have dramatically improved the accessibility of the complete plant mitogenomes [20,21,22].
Hemiboea C. B. Clarke (1888) is a genus of perennial herbaceous plants in Gesneriaceae, consisting of about 42 species and five varieties [23]. They are mainly found in southern China, with some species also present in northern Vietnam and Japan [24,25]. Among these, H. yongfuensis has been found only in Yongfu County, Guilin, Guangxi, growing on limestone rocks (Figure 1). Owing to its limited population size, restricted distribution area, and the risk of habitat degradation, the species was assessed as critically endangered according to the IUCN standards [26]. Previous phylogenetic studies of Hemiboea were based solely on plastid and nuclear ribosomal ITS sequences [27]. As of now, no mitogenome of the genus Hemiboea has been assembled or utilized for phylogenetic studies, despite the availability of complete mitogenomes for 53 other species within the order Lamiales in NCBI.
In this study, we assembled the complete mitogenome of H. yongfuensis by PacBio long-read sequencing, which represents the first assembled mitogenome in the genus. We comprehensively analyzed its gene composition, codon usage preferences, repetitive sequences, RNA editing, and homology between the mitochondrial and chloroplast genomes, as well as synteny with the previously published mitogenomes from the Gesneriaceae family and phylogenetic relationships in Lamiales. We mainly aimed to uncover the evolutionary processes of the mitogenome and its usefulness in the phylogenetics in Lamiales. Although the focus was on the mitogenome, the chloroplast genome of the same individual was also assembled as it serves as a critical reference for identifying plastid-derived DNA insertions in the mitogenome. The assembled mitogenome expands the genomic resources of the Gesneriaceae family, a diverse plant group with substantial ornamental, horticultural, as well as medicinal values, which would serve as a valuable asset for developing molecular markers to further explore the evolution and adaptation and inform conservation and breeding.
2. Materials and Methods
2.1. Plant Materials, DNA Extraction, and Sequencing
Although H. yongfuensis was assessed as critically endangered (CR) according to the IUCN standards [26], it has not been included in any (officially) issued list of protected plant species. The individuals we collected were from a wild, non-private area (i.e., not from any nature reserve or owned by any third party), and therefore no collection permit or license was required. Fresh leaves were collected from a wild individual in Luojin Town, Yongfu County, Guilin city, Guangxi, southern China (110°8′17″ E, 24°58′36″ N, 284 m). The fresh leaves were rinsed with ultrapure water and immediately frozen in liquid nitrogen. Total DNA was then extracted using a modified CTAB method [28], and its concentration and integrity were assessed with NanoDrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA) and Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). For short-read sequencing, genomic DNA was randomly fragmented using a Covaris ultrasonic homogenizer to generate a short-read library with 150 bp inserts. Paired-end sequencing was conducted using the Illumina platform NovaSeq 6000 platform, yielding about 218 Gb of sequencing data. For long-read sequencing, a PacBio HiFi long-read library (SMRTbell library) with an average insert size of about 17 kb was prepared. Sequencing was conducted on the PacBio Revio platform, producing 49 Gb of sequencing data with an average read length of 17,269 bp and an N50 value of 16,876 bp. All library preparation and sequencing were carried out by Wuhan Benagen Technology Company (Wuhan, China).
2.2. Assembly and Annotation of Mitogenome
Mitogenome assembly was performed using PacBio HiFi long-read sequencing data. The raw sequencing data were assembled de novo with Flye (v2.9.1-b1780) [29] under default parameters to generate graphical assembly outputs in GFA (Graphical Fragment Assembly) format. All resultant contigs in FASTA format were subjected to BLASTn (v2.13.0) [30] analysis with the reference mitogenomes of Arabidopsis thaliana and P. hunanensis to identify contigs that include mitogenome sequences. A customized BLAST database was constructed using makeblastdb, followed by homology searches implemented with the parameters -evalue 1 × 10^−5^ -outfmt 6 -max_hsps 10 -word_size 7 -task blastn-short. The GFA file was visualized and analyzed using Bandage software (v0.8.1) [31], and mitochondrial contigs were filtered based on BLASTn results to obtain the draft mitogenome of H. yongfuensis (Figure 2A). Each node in the visualization corresponds to an assembled contig, and a black line connecting two nodes indicates overlapping regions between the respective contig sequences. Collectively, these sequences form a complex, multi-branched circular genome structure. For key branching nodes, connections supported by longer sequencing reads are prioritized as follows: long reads were mapped to the sequences at these branching nodes, and a long read that continuously aligns with two connected sequences along the black line serves as evidence for their connection. When multiple alternative connections exist at a branching node, those supported by a greater number of long reads are given priority. Through this process, the most plausible genome structure (Figure 2B) of the H. yongfuensis mitogenome was ultimately determined.
Annotation of the mitogenomes was performed using PMGA [32] and Geseq software (v2.03) [33], with A. thaliana (NC_037304), P. hunanensis (NC_087815), and O. esquirolii (PQ850635) used as references. After further manual verification and correction, the annotated mitogenomes have been deposited in GenBank. The mitogenome annotation results were visualized by OGDRAW [34].
2.3. Assembly and Annotation of Chloroplast Genome
The chloroplast genome assembly was conducted using GetOrganelle software (v1.7.5) [35] with default parameters from short-read sequencing data. The chloroplast genome was annotated using the CPGAVAS2 web service (http://www.herbalgenomics.org/cpgavas2/ accessed on 17 September 2025) [36] with H. yongfuensis (GenBank: NC_079573) as the reference. Finally, the assembled and annotated H. yongfuensis chloroplast genome was submitted to GenBank. This assembly and annotation primarily aimed to facilitate chloroplast–mitochondrial homology analysis using the same sample.
2.4. Analysis of Codon Usage and Repeated Sequences
The protein-coding genes (PCGs) of the mitogenome were initially extracted using PhyloSuite software (v1.2.2) [37]. Thereafter, CodonW software (v1.4.2) [38] was utilized to calculate the relative synonymous codon usage (RSCU) values and conduct codon usage bias analysis.
Simple sequence repeats (SSRs) were detected using MISA (v2.1) (https://webblast.ipk-gatersleben.de/misa/ accessed on 29 June 2025) [39] with minimum repeat unit thresholds set to 10, 5, 4, 3, 3, and 3 for mono- to hexa-nucleotide motifs, respectively. Tandem repeat analysis was performed using Tandem Repeats Finder (TRF v4.09; https://tandem.bu.edu/trf/trf.unix.help.html accessed on 29 June 2025) [40] with default settings. For dispersed repeats, Reputer (https://bibiserv.cebitec.uni-bielefeld.de/reputer/ accessed on 29 June 2025) [41] was employed to identify four categories: forward (F), palindromic (P), reverse (R), and complementary (C) repeats.
2.5. Chloroplast–Mitochondrial Homology Analysis
The homologous sequence analysis between the mitochondrial and plastid genomes (MTPTs) was conducted with BLASTN (v2.13.0) [30] under the following settings: -evalue 1 × 10^−5^, -word_size 10, and -outfmt 6. Finally, TBtools (v2.309) [42] was employed to visualize chloroplast-to-mitochondrial gene transfer events.
2.6. RNA Editing Sites Prediction
RNA editing sites within all PCGs encoded by the H. yongfuensis mitogenome were predicted using Deepred-Mt [43], a convolutional neural network (CNN)-based model. Predictions with probability scores exceeding 0.9 were retained for subsequent analysis.
2.7. The Mitogenome Comparative Analyses of H. yongfuensis to Other Gesneriaceae
The mitogenome sequences of the five Gesneriaceae species including H. yongfuensis underwent pairwise BLAST (v2.16.0) comparisons. Searches were conducted with an E-value threshold of 1 × 10^−5^ and a word size of 7. Resulting alignments were retained only if they exhibited over 80% nucleotide identity and spanned more than 500 base pairs, with the output format specified as -outfmt 6. Subsequently, the multiple synteny plot was generated using TBtools software (v2.309).
Gene nucleotide sequences were aligned using MAFFT (v7.505) [44]. Nucleotide diversity (Pi) for each aligned gene was then calculated in DnaSP (v5.10) [45]. The resulting Pi values were visualized as a line graph using GraphPad Prism (v10.1.2).
2.8. Phylogenetic Analysis
Mitogenome sequences of 56 species, encompassing all the available Lamiales species and 2 Solanaceae species designated as the outgroup, were retrieved from NCBI. A total of 25 conserved PCGs from these species were extracted using PhyloSuite (v1.1.16) [37]. These PCGs were subsequently aligned with MAFFT (v7.505) [44] and segments that were excessively divergent or poorly aligned were masked using the alignmentFilter package through a sliding window grouping–regrouping strategy (with the stringent parameter prob set to 0.0001) [46]. The resulting alignment was then employed to construct a maximum likelihood (ML) phylogenetic tree using IQ-TREE (v1.6.12) [47,48], with branch support assessed through 1000 ultrafast bootstrap replicates (-B 1000) and 1000 SH-aLRT tests (-alrt 1000). The phylogenetic tree was visualized using ITOL (v6) [49].
3. Results
3.1. Structural Characteristics of the Mitogenome of H. yongfuensis
We successfully assembled the mitogenome of H. yongfuensis and presented it in a specific linear structure, which, however, may be one of its possible conformations. Figure 2A displays the schematic of H. yongfuensis mitogenome assembly, including nodes with their respective length and sequencing depth (Table S1). The sequences of the linear contig produced after resolving the multifurcated nodes (at the black connecting lines) using long-read data are illustrated in Figure 2B, with a specific solution path of edge_3-edge_2-edge_1.
The mitogenome of H. yongfuensis measures a total length of 619,997 bp, with a GC content of 43.63% (Figure 3). The annotation identified 37 distinct PCGs, consisting of 24 core mitochondrial genes and 13 non-core genes, along with 21 tRNA genes (including 5 multi-copy tRNAs) and 3 rRNA genes (Table 1). Among the core genes, there is one protein transport subunit gene (mttB), nine NADH dehydrogenase genes (nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, and nad9), and one gene for ubiquinol–cytochrome c reductase (cob). Additionally, the core gene set includes three cytochrome c oxidase genes (cox1, cox2, and cox3), one maturase gene (matR), and four cytochrome c biogenesis genes (ccmFC, ccmFN, ccmC, and ccmB), along with five ATP synthase genes (atp1, atp4, atp6, atp8, and atp9). Non-core genes comprise four ribosomal large subunit genes (rpl2, rpl5, rpl10, rpl16), seven ribosomal small subunit genes (rps3, rps4, rps7, rps10, rps12, rps13, rps14), and two succinate dehydrogenase genes (sdh3, sdh4).
3.2. Codon Usage of PCGs
In the complete mitogenome of H. yongfuensis, we identified 10,546 codons among the 37 PCGs detected. The mitogenome of H. yongfuensis encodes 21 different amino acids, utilizing a total of 64 distinct codons. The analysis of codon usage bias, as shown in Figure 4, revealed that aside from the start codon ATG, tryptophan (Trp, TGG), and alanine (Ala, GCA), three of which have RSCU values of one, there exists a general preference in codon usage among the mitochondrial PCGs. Specifically, 31 codons exhibit RSCU values greater than one, indicating a preferential usage for those amino acids. For instance, alanine (Ala) demonstrates a strong preference for the codon GCT, boasting the highest RSCU value of 1.59, while the stop codon TAG has the lowest frequency at 0.43.
3.3. Analysis of Repeat Sequences
An analysis of repetitive sequences within the mitochondrial genome. of H. yongfuensis revealed a total of 121 simple sequence repeats (SSRs), as detailed in Table S2. This includes 27 monomeric SSRs (22.31%), 28 dimeric SSRs (23.14%), 17 trimeric SSRs (14.05%), and 43 tetrameric SSRs (35.53%), which represented the largest proportion. There are also six pentameric SSRs (4.96%), but no hexameric SSRs were found (Figure 5A). Furthermore, four tandem repeat sequences, each with a matching degree greater than 79% and lengths varying from 16 to 24 bp, were identified in the mitogenome (Figure 5B; Table S3). The mitogenome of H. yongfuensis also contains a significant number of dispersed repeats, totaling 294 pairs, with lengths of 30 bp or more (Table S4). This consists of 122 pairs of palindromic repeats (P), 171 pairs of forward repeats (F), and 1 pair of reverse repeats (R), with no complementary repeats detected (Figure 5B; Table S4). The longest palindromic repeat measures 141 bp, while the longest forward repeat extends to 4556 bp (Table S4).
3.4. Homology Analysis of Genomic Sequences
A sequence similarity analysis revealed 74 homologous mitochondrial plastid sequences (MTPTs) in H. yongfuensis. These sequences with over 80% similarity between the cpDNA and mitogenome measure a total of 72,759 bp, which accounts for 11.74% of the overall mitogenome length (Figure 6). Of these, MTPT1 is the longest, with a length of 4956 bp (Table S5).
The annotation of these homologous sequences identified a total of 31 complete genes among the 74 MTPTs analyzed. This includes 21 PCGs such as rps11, rps14, rps8, rps2, ndhJ, ycf15, psbB, rpl23, rpl2, rpl14, rpl36, psbA, psbC, psbD, psbL, psbF, psbE, petL, petG, infA, and rpoB, as well as 10 tRNA genes: trnA-UGC, trnD-GUC, trnH-GUG, trnI-CAU, trnL-CAA, trnL-UAA, trnN-GUU, trnW-CCA, trnS-GGA, and trnfM-CAU. In addition, several incomplete chloroplast genes were also detected within these homologous sequences, including ycf2, ndhK, rpoA, rps4, and others (Table S5).
3.5. RNA Editing Site Prediction
Deepred-mt software (https://github.com/aedera/deepredmt/ accessed on 20 September 2025) predicted RNA editing events in the 37 PCGs from the mitogenome of H. yongfuensis. With a predictive performance score of 0.9 (where 0 indicates no editing and 1 indicates editing, with values closer to 1 suggesting a higher likelihood of editing), a total of 429 potential RNA editing sites were identified across these 37 PCGs. These sites were consistently characterized by C to U conversions (Table S6). Notably, the ccmB gene had the highest number of RNA editing sites, with 37 identified. Following this, the nad4 gene experienced 36 RNA editing events. In comparison, the rps7 and atp1 genes each had only one editing site (Figure 7). Additionally, further analysis revealed that a total of 44 codon variants were associated with RNA editing sites. Among these variants, it was predicted that most amino acids would change from hydrophilic to hydrophobic, accounting for approximately 47.09%. Meanwhile, 43.12% of the amino acids were expected to maintain their hydrophobicity or hydrophilicity, and 9.32% were predicted to change from hydrophobic to hydrophilic. Notably, the study identified that the codons may have been edited into stop codons in atp6 and rps10 genes (Table S7).
3.6. Collinearity Analysis
In the collinearity analysis, any collinear blocks shorter than 0.5 kb and with similarity below 80% were omitted. The results indicated that a significant number of homologous collinear blocks were identified between H. yongfuensis and its allies within Gesneriaceae (Figure 8). A comparative analysis was conducted on the mitogenomes of between H. yongfuensis and P. hunanensis, and between H. yongfuensis and O. esquirolii. It was revealed that the highly homologous regions in the mitogenomes exhibit a broader distribution range between H. yongfuensis and P. hunanensis (Figure 8). Notably, the length of the homologous region in H. yongfuensis is 447,434 bp, accounting for 72.17% of the total length of its mitogenome. In P. hunanensis, the homologous region spans 447,862 bp, accounting for 77.86% of the total length of its mitogenome (Table S8). The largest collinear block was in comparison with P. hunanensis, spanning approximately 13,188 bp (Table S8). The arrangement of collinear blocks within the mitogenomes of the four species in Gesneriaceae varied significantly, and that H. yongfuensis exhibited extensive genomic rearrangements (Figure 8) against its allies from the same family.
3.7. Nucleotide Diversity
Nucleotide diversity (Pi) is a key metric used to evaluate genetic variation in nucleotide sequences among different species and populations. Areas with high variability can serve as potential molecular markers for differentiating between populations. An analysis of Pi was carried out on mitochondrial genes from the four Gesneriaceae species. The results revealed that the mitochondrial gene with the highest variability was cox2 (Pi = 0.03333), followed by nad2 (Pi = 0.025) (Figure 9). This finding suggests that the nucleotide sequences of mitochondrial PCGs in Gesneriaceae are highly conserved.
3.8. Phylogenetic Analyses
To investigate the phylogenetic significance of mitochondrial genes and the position of H. yongfuensis within Lamiales, a phylogenetic tree was constructed using DNA sequences from the 25 PCGs from 54 Lamiales species whose mitochondrial genomes had been publicly available. Two species from the Solanales order were used as outgroups. The selected PCGs included atp1, atp4, atp6, atp8, atp9, ccmB, ccmC, cob, cox1, cox2, cox3, matR, mttB, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9, rps3, and rps12. The results (Figure 10) indicate that all the eleven families are monophyletic in the order. Remarkably, the monophyly of Gesneriaceae is strongly supported (BS = 100), with H. yongfuensis embedded within this family. Within Gesneriaceae, the phylogenetic relationships are as follows: H. yongfuensis is closely related to O. esquirolii (BS = 50); this clade is further grouped with P. hunanensis to form a larger lineage (BS = 95), which is then sister to B. hygrometrica (BS = 99), and the entire clade encompassing these four taxa ultimately forms a sister relationship with Hab. rhodopensis (BS = 100).
4. Discussion
Currently, complete mitogenomes data for Gesneriaceae plants remain extremely limited, with only four species, B. hygrometrica, Hab. rhodopensis, P. hunanensis, and O. esquirolii [5,17,18,19], having had their mitogenomes completely determined. To comprehensively understand the diversity and evolutionary characteristics of Gesneriaceae mitogenomes, we sequenced, assembled, and conducted systematic analysis of the complete mitochondrial genome of H. yongfuensis. Notably, the assembled H. yongfuensis mitogenome exhibits a linear structure, representing the first complete mitogenome obtained within the genus Hemiboea. Comparisons between H. yongfuensis and previously reported Gesneriaceae mitogenomes reveal that despite significant variations in genome size across genera, from 619,997 bp (H. yongfuensis) to 454,871 bp (O. esquirolii), the GC content remains highly consistent (ranging from 43.54% in P. hunanensis to 43.8% in O. esquirolii). This supports the hypothesis of a relatively stable GC content in higher plant genomes during evolution [12]. Furthermore, the reported Gesneriaceae species encompass both linear structures (H. yongfuensis, P. hunanensis, and O. esquirolii) and circular structures (B. hygrometrica and Hab. rhodopensis), further corroborating the view that plant mitogenome structures exhibit high dynamism and complexity.
Despite the significant variations observed in the size and structure of plant mitogenomes, the number of mitochondrial genes among terrestrial plants has remained relatively stable [50,51]. We further evaluated the conservation of the H. yongfuensis mitogenome at the gene composition level. Recent annotation results indicate that the mitogenome of H. yongfuensis comprises 61 genes, which include 37 PCGs, 21 tRNAs, and 3 rRNAs. Among these genes, 37 PCGs are commonly shared by most angiosperms [6]. This relative stability in gene content provides a basis for examining other features that contribute to mitogenome diversity. Differences in the frequencies of synonymous codon usage can offer valuable supplementary data for phylogenetic analyses [52]. Our investigation into the codon usage preferences in 37 PCGs revealed that the same amino acid can be encoded by as many as six possible codons (such as leucine, serine, and arginine). Moreover, we observed distinct differences in the usage frequencies of codons associated with each amino acid. For those codons exhibiting a synonymous codon usage index greater than 1, a clear preference for adenine and thymine (AT) bases at the third codon position is evident, a trend that is commonly found in higher plants [53].
The migration of chloroplast sequences to the mitogenome is a common phenomenon observed in higher plants [54,55,56,57,58]. These transferred sequences, known as mitochondrial plastid DNA transfers (MTPTs), had been integrated into the mitogenome and contribute to its complexity and diversity. H. yongfuensis also exhibits this chloroplast sequence transfer, which is consistent with findings from other species in Gesneriaceae [5]. Specifically, the total length of the transferred segments within the mitogenome of H. yongfuensis measures 72,759 bp, which constitutes 11.74% of the entire mitogenome length, which slightly exceeds the commonly reported values ranging from 3% to 11.5% [59,60]. Previous research indicates that genes migrating from chloroplasts to mitochondria often become pseudogenes over time, losing their functional significance due to processes such as sequence recombination [61]. Interestingly, the MTPT fragments identified in H. yongfuensis contain multiple intact PCGs and tRNA genes. The integrity of the sequence may indicate that the transfer event is relatively recent or has been somewhat conserved throughout evolution, suggesting that it may still have functional significance. However, more evidence is needed to determine whether it is expressed and performs a function. Additionally, the underlying mechanisms that facilitate intergenomic sequence migration remain to be further investigated [62].
In addition to intergenomic transfer, transcript-level modification may also contribute to mitochondrial functional variation [63]. RNA editing, the process that primarily consists of the conversion of cytosine (C) to uracil (U) in mitochondrial gene transcripts, impacts protein function and structure and thus plays a crucial role in evolution [62,64]. In our study of the H. yongfuensis mitogenome, 429 RNA editing sites were detected across its 37 PCGs. Among the Gesneriaceae species, this count is greater than that reported for Hab. rhodopensis (419 sites), yet it is lower than P. hunanensis (455 sites) [5,17,18]. In H. yongfuensis, it is predicted that 47.09% of the amino acids in PCGs will change from hydrophilic to hydrophobic due to RNA editing. These changes assist in protein folding and the formation of secondary structures, which help optimize protein conformation and function [65,66]. The biological significance of RNA editing extends beyond gene expression regulation; it also plays a crucial role in the adaptive evolution of plants. Research indicates that the rate of non-synonymous substitutions due to C-to-U editing events is higher than that of synonymous substitutions. Moreover, these editing sites show greater conservation at the DNA level, suggesting that they have an adaptive role in natural selection [67]. Additionally, RNA editing is closely linked to plant development, the regulation of flowering time, and responses to environmental stresses [62]. In stressful environments, such as limestone areas, plants require a stable energy supply and effective stress responses. This adaptability is determined by a combination of multiple mechanisms [68,69,70]. H. yongfuensis is adapted to the limestone habitat and has flowering period divergence compared to its known sister species Hem. subcapitata from the same areas [71]. However, whether RNA editing has driven the divergence and adaptation of the species merits further investigation.
Plant mitogenomes contain many repetitive sequences, such as simple, tandem, and scattered repeats [72,73,74], which affect their size and structure. These repeats can influence genome evolution and diversity by enabling recombination and gene transfer. Previous studies have demonstrated that mitogenome rearrangements exhibit high dynamicity across different lineages and may also be influenced by evolutionary processes. For instance, in Brassica plants, mitogenome rearrangements are associated with evolutionary selection during domestication and breeding processes, suggesting they may be linked to adaptation under environmental stress [75]. In the mitogenome of H. yongfuensis, the simple repeat sequences identified are mainly monomers and dimers, predominantly consisting of A/T repeats and AT/TA repeat structures. This feature is consistent with two other similar species in the Gesneriaceae family [5,19] and aligns with a general trend observed in most terrestrial plants [76,77,78].
In angiosperms, mitogenome data have proven to be very effective at clarifying deep relationships, serving as a valuable resource for phylogenetic and evolutionary studies of these plants [79,80]. In this study, we utilized mitogenomes to determine the phylogenetic position of H. yongfuensis in Gesneriaceae as well as in Lamiales. The results show that B. hygrometrica, Hab. rhodopensis, P. hunanensis, O. esquirolii, and H. yongfuensis form a highly supported clade, with a bootstrap support (BS) of 100. Notably, H. yongfuensis is more closely related to O. esquirolii, which is consistent with the findings of the previous studies based on chloroplast genomes [81]. However, the bootstrap support for this relationship is relatively low (BS = 50), necessitating further verification using nuclear genes. Additionally, due to the imbalance between plant mitochondrial and chloroplast data, more comprehensive sampling of mitogenomes is also required for better resolution of the phylogeny. Furthermore, the results showed somewhat significant variation in nucleotide diversity (Pi) among the mitochondrial PCGs of H. yongfuensis, ranging from complete conservativeness in atp9 (Pi = 0) to higher variability in cox2 (Pi = 0.033) and nad2 (Pi = 0.025). This variation may reflect different selective pressures resulting from functional constraints and evolutionary processes. However, adaptive interpretations remain uncertain without more extensive sampling and integrated analyses that explicitly link genomic variation to ecological differences.
5. Conclusions
In this study, we successfully assembled and annotated the complete mitogenome of H. yongfuensis, marking the first report of a complete mitogenome for the genus Hemiboea. The mitogenome features a linear structure, measuring 619,997 bp in total, and including 37 PCGs, 21 tRNA genes, and 3 rRNA genes. The study revealed genomic rearrangements and frequent plastid-derived DNA insertions, whereas the gene content, GC composition, and codon usage patterns are largely preserved. While the taxonomic sampling in this study was somewhat limited, the findings indicate that mitochondrial data can provide valuable phylogenetic insights. Overall, these results contribute to the limited mitochondrial DNA database of Gesneriaceae plants, enhance our understanding of their evolution, and provide important genetic data for the taxonomic classification, systematic evolution, and species conservation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Sun M.Y. Zhang M.Y. Chen X.N. Liu Y.Y. Liu B.B. Li J.M. Wang R.Z. Zhao K.J. Wu J. Rearrangement and domestication as drivers of Rosaceae mitogenome plasticity BMC Biol.20222018110.1186/s 12915-022-01383-335986276 PMC 9392253 · doi ↗ · pubmed ↗
- 2Møller I.M. Rasmusson A.G. Van A.O. Plant mitochondria—Past, present and future Plant J.202110891295910.1111/tpj.1549534528296 · doi ↗ · pubmed ↗
- 3Cheng Y. He X.X. Priyadarshani S.V.G.N. Wang Y. Ye L. Shi C. Ye K.Z. Zhou Q. Luo Z.Q. Deng F. Assembly and comparative analysis of the complete mitochondrial genome of Suaeda glauca BMC Genom.20212216710.1186/s 12864-021-07490-933750312 PMC 7941912 · doi ↗ · pubmed ↗
- 4Wallace D.C. Singh G. Lott M.T. Hodge J.A. Schurr T.G. Lezza A.M.S. Elsas L.J. Nikoskelainen E.K. Mitochondrial DNA mutation associated with Leber’s hereditary optic neuropathy Science 19882421427143010.1126/science.32012313201231 · doi ↗ · pubmed ↗
- 5Chen L.L. Dong X. Huang H. Xu H.X. Rono P.C. Cai X.Z. Hu G.W. Assembly and comparative analysis of the initial complete mitochondrial genome of Primulina hunanensis (Gesneriaceae): A cave-dwelling endangered plant BMC Genom.20242532210.1186/s 12864-024-10247-9PMC 1098375438561677 · doi ↗ · pubmed ↗
- 6Wang Y. Chen S.J. Chen J.J. Chen C.J. Lin X.J. Peng H. Zhao Q. Wang X.Y. Characterization and phylogenetic analysis of the complete mitochondrial genome sequence of Photinia serratifolia Sci. Rep.20231377010.1038/s 41598-022-24327-x 36641495 PMC 9840629 · doi ↗ · pubmed ↗
- 7Kozik A. Rowan B.A. Lavelle D. Berke L. Schranz M.E. Michelmore R.W. Christensen A.C. The alternative reality of plant mitochondrial DNA: One ring does not rule them all P Lo S Genet.201915 e 100837310.1371/journal.pgen.100837331469821 PMC 6742443 · doi ↗ · pubmed ↗
- 8Cole L.W. Guo W.H. Mower J.P. Palmer J.D. High and variable rates of repeat-mediated mitochondrial genome rearrangement in a genus of plants Mol. Biol. Evol.2018352773278510.1093/molbev/msy 17630202905 · doi ↗ · pubmed ↗
