Assembly and Characterization of the Complete Mitochondrial Genome of Flemingia philippinensis Merr. et Rolfe
Jingli Huang, Langping Liao, Yuwei Pan, Zhihong Chen, Dong Xiao, Jie Zhan, Longfei He, Aiqin Wang

TL;DR
This study sequenced and analyzed the complete mitochondrial genome of Flemingia philippinensis, revealing its genetic structure and evolutionary insights.
Contribution
The paper provides the first complete mitochondrial genome assembly and detailed characterization for Flemingia philippinensis.
Findings
The mitogenome is circular, 427,353 bp long with 44.90% GC content.
It contains 33 protein-coding genes, 16 tRNA, and 3 rRNA genes.
RNA editing identified 498 C-to-U sites, with notable enrichment in nad4 and ccmB.
Abstract
Flemingia philippinensis Merr. et Rolfe (F. philippinensis) is a Chinese herbal medicine rich in polyphenols, especially isoflavone derivatives. It exhibits potent anti-inflammatory properties and is widely used in the treatment of various diseases. In this study, we aim to sequence, assemble, and analyze the mitogenome of F. philippinensis in detail to understand the genetic structure of their organelles and their gene expression. The results showed that the mitogenome of F. philippinensis possesses a circular architecture with a total length of 427,353 bp and a GC content of 44.90%. Annotation results revealed 33 unique protein-coding genes (PCGs), 16 transfer RNA (tRNA), and 3 ribosomal RNA (rRNA) genes in the mitogenome. Furthermore, comparative analysis of mitogenome andchloroplast gemone (cpgemone) sequences identified six mitochondrial plastid sequences (MTPTs), including one…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9- —National Natural Science Foundation of China
- —China Agricultural Research System of the Ministry of Finance and the National Agricultural Research Center
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotosynthetic Processes and Mechanisms · Genomics and Phylogenetic Studies · Biological and pharmacological studies of plants
1. Introduction
F. philippinensis prostrata is native to China. It is an erect or spreading subshrub of the genus Flemingia in the Fabaceae family. Currently, it is mainly distributed in South China, Central China, Southwest China, and other regions in China. There are 16 species and 1 variety of the genus Flemingia in China [1,2]. F. philippinensis is rich in polyphenols, especially isoflavone derivatives [3,4,5] and compounds such as genistein [6]. The potential of these compounds as active ingredients for cardiovascular protection and anti-inflammation has been confirmed in pharmacological studies [5].
Mitochondria are the “power stations” of cells and are typical organelles inherited maternally [7]. Mitochondria and chloroplasts are organelles in plant cells that contain genetic material in addition to the nucleus. Their genomes evolve independently of the nuclear genome [8]. The main function of mitochondria is to synthesize ATP through oxidative phosphorylation and produce metabolic intermediates for various cellular processes [9,10,11]. In addition to providing cellular energy, mitochondria also play important roles in processes such as cell differentiation, cell signaling, apoptosis, cell growth regulation, and the cell cycle [12,13]. The mitogenome size of angiosperms ranges from 66 kb to 11.7 mb [14,15]. The coding genes within the coding sequence of the mitogenome of different species are relatively conserved [16,17]. This is closely related to the adaptation of species to global environmental fluctuations and the evolutionary stability of species, which is mainly reflected in the GC content of the mitogenome in higher plants [18]. The mitogenome contains a large number of exogenous sequences from the nuclear and chloroplast genomes [19], and tandem and non-tandem repetitive DNA sequences. A large number of repetitive sequences may lead to frequent homologous recombination events [20,21,22], thus forming a series of different genomic structures. In summary, mitochondria are essential organelles in plants and constitute the extrachromosomal genetic system of plant cells. The mitogenome reflects the conservative evolutionary patterns and evolutionary rates of plants, and thus, mitochondria possess considerable research significance in the fields of molecular evolution and molecular ecology. To date, only the chloroplast genomes of Flemingia macrophylla (Willd.) [23] and Flemingia stricta Roxb. ex Ait. 1812 [24] have been reported among species of the genus Flemingia. However, both the mitochondrial and chloroplast genomes of F. philippinensis remain completely unknown. The assembly and annotation of the mitogenome of F. philippinensis will facilitate the elucidation of genetic identification, phylogenetic evolution, and taxonomic classification of F. philippinensis and other congeneric Flemingia species.
The plant mitogenome is generally a single circular molecule [25]. At the same time, there are other forms, such as linear [26,27,28], two or more circular structures [29,30,31], circular + branched form [32], multiple structures coexisting [6], multi-branched conformation [15,33], and so on. The structure of the mitogenome significantly differs among Fabaceae plants. Perennial soybean species exhibit distinct differences in mitogenome structure from annual species, as well as from the wild and cultivated varieties of annual soybeans. This variation is directly associated with the transfer patterns of nuclear plastid DNAs (NUPTs) and nuclear mitochondrial DNAs (NUMTs) into the nuclear genome, together with the high-frequency recombination of intragenomic repetitive sequences. Soybean varieties with more fragmented NUPT and NUMT fragments integrated into their nuclear genomes possess a simpler mitochondrial genome structure, as exemplified by perennial soybeans [34]. Large repeated sequences may mediate the mitogenome sequence to form multiple subgenomic circles, thereby constituting a molecular pool. Soybean is a high plant with the largest number of large repeated sequences, so the molecular pool of its mitogenome is also the most complex [35]. The structure of plant mitogenomes can be altered through homologous recombination of long repetitive sequences. For instance, the two large circular molecules present in the mitogenome of Tylosema esculentum can be reversibly converted into five basic molecular forms via this mechanism [36].
In this study, we performed the sequencing and annotation of the mitogenome of F. philippinensis and elucidated multiple core characteristics of its mitochondrial genome, including the genomic structure, gene codon usage bias, repetitive sequences, homologous sequences with the chloroplast genome, and RNA editing sites. Additionally, we constructed a phylogenetic tree using the PCGs of the mitogenomes of 47 species to assess whether it can be used to infer the phylogenetic relationships of F. philippinensis. At the same time, a multiple synteny plot was used to compare eight closely related species to determine whether gene recombination and the mitogenome have undergone rearrangement during the evolutionary process. We hope that the study of the mitogenome in F. philippinensis will expand our understanding of the genus Flemingia and provide a theoretical basis for the further utilization of F. philippinensis germplasm resources. Moreover, it provides valuable resources for future molecular diversity research and mitochondrial gene expression.
2. Results
2.1. Assembly and Annotation of the F. philippinensis Mitogenome
The high-quality mitogenome of F. philippinensis was assembled and annotated with Illumina and Nanopore sequencing data (Figure 1). The total length of the mitogenome of F. philippinensis is 427,353 bp with 44.90% GC content, and the complete mitogenome is morphologically a single circular type (Figure 2). Among them, the total length of PCGs is 29,391 bp, when the length of tRNA genes is 1428 bp and rRNA genes is 5250 bp. It is remarkable that the length of non-coding regions is up to 391,284 bp, accounting for 91.56% of the total length (Table 1). The length and sequencing depth of each node are shown in Table 2.
We conducted a comprehensive annotation of the mitogenome of F. philippinensis, and a total of 52 genes were identified, including 33 protein-coding genes (PCGs), 16 tRNA genes, and 3 rRNA genes. Table 3 provides a detailed breakdown of the functional classification of these genes. Among the 33 PCGs, 24 were core genes, encompassing five ATP synthase genes (atp1, atp4, atp6, atp8, and atp9); nine NADH dehydrogenase genes (nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, and nad9); four cytochrome C biogenesis genes (ccmB, ccmC, ccmFC, and ccmFN); three cytochrome C oxidase genes (cox1, cox2, and cox3); one membrane transport protein gene (mttB); one maturation enzyme gene (matR); and one ubiquinol-cytochrome C reductase gene (cob). The remaining nine PCGs were non-core genes, which included two ribosomal large subunit genes (rpl5, rpl16); six ribosomal small subunit genes (rps1, rps3, rps4, rps10, rps12, rps14); and one succinate dehydrogenase gene (sdh4).
2.2. Codon Usage Analysis of PCGs
Eukaryotes possess 20 canonical amino acids, which are encoded by 61 sense codons with an amino acid coding function. In addition, three codons (UAA, UGA, UAG) do not encode any amino acid and serve as termination signals for protein translation. A total of 9797 codons were identified in the PCGs of the mitogenome of F. philippinensis, including all 64 types of amino acid codons and encoding 20 amino acids. The codon usage patterns of each amino acid are presented in Figure 3 and Supplementary Table S1. Except for methionine (Met, AUG) and tryptophan (Trp, UGG), which are each encoded by a single codon, all the other amino acids are encoded by at least two or more synonymous codons; notably, arginine (Arg), leucine (Leu), and serine (Ser) each have six synonymous codons (Figure 3). Codon bias is shaped by mutation pressure and natural selection, and codons with an RSCU value greater than 1 are defined as the preferentially used codons for their corresponding amino acids. With the exception of the initiation codon AUG and the tryptophan codon UGG (both with an RSCU value of 1), widespread codon usage bias was observed in the PCGs of the F. philippinensis mitogenome. Among the identified codons, 29 were classified as high-frequency codons (RSCU > 1). Specifically, the CAA codon for glutamine (Gln) exhibited the highest RSCU value of 1.54, followed by the GCU codon for alanine (Ala) with an RSCU value of 1.51. Furthermore, among the 20 amino acids, Leu was encoded by the largest number of codons (1044), accounting for 10.66% of the total codons, followed by Ser with 900 codons (9.19% of the total). In contrast, cysteine (Cys) was encoded by the fewest codons (only 139), representing 1.42% of the total codons. In terms of individual codons, UUU had the highest occurrence frequency with 372 instances, followed by AUU with 326 instances (Supplementary Table S1).
2.3. Repeats and SSR Analysis
To explore the potential role of repetitive sequences, we analyzed three types of repetitive sequences in the mitogenome of F. philippinensis: simple sequence repeat (SSR), tandem repeat, and dispersed repeat. In our investigation of F. philippinensis, a total of 134 SSRs were identified (Supplementary Table S2). Monomeric and dimeric forms accounted for 50.00% of the total SSRs. Interestingly, among 44 monomeric SSRs, thymine (T) accounts for a high proportion, amounting to 56.82% (Figure 4A). Among dimeric SSRs, the AT, TA, and AG motifs accounted for 60.9% of the total, whereas only three hexameric SSRs were identified. Meanwhile, a total of nine tandem repeat sequences with lengths ranging from 10 to 25 bp and a similarity of over 91% were detected (Supplementary Table S3). In addition, we further characterized the dispersed repeat sequences in the mitogenome of F. philippinensis. As a result, 957 pairs of repeat sequences with a length of no less than 30 bp were identified, among which 235, 393, 227, and 102 pairs were palindromic repeats, forward repeats, reverse repeats, and complementary repeats, respectively (Figure 4B, Supplementary Table S4). The longest palindromic repeat was 739 bp in length, and the longest forward repeat reached 4805 bp.
2.4. Chloroplast-Derived Mitogenomic Sequences
The transfer of intracellular genetic material is a common phenomenon during the evolution of higher plants. To investigate the migration of genetic material from chloroplasts to mitochondrial organelles, it is necessary to perform sequencing and annotation of the plant chloroplast genome. The chloroplast genome of F. philippinensis assembled in this study has a size of 158,638 bp (Figure 5A). Based on sequence similarity analysis, a total of six homologous fragments between the mitochondrial and chloroplast genomes were identified in F. philippinensis, with lengths ranging from 77 to 1447 bp (Figure 5B, Supplementary Table S5). These fragments had a total length of 1936 bp, accounting for 0.45% of the total length of the mitogenome. Among these six homologous fragments, MTPT6 was the longest, with a length of 1447 bp. Through annotation of these homologous sequences, five complete genes were identified on the six homologous fragments, all of which were tRNA genes (trnD-GUC, trnH-GUG, trnM-CAU, trnN-GUU, trnW-CCA) (Supplementary Table S5).
2.5. Phylogenomic Analysis
In order to explore the evolutionary relationship of F. philippinensis, we constructed a maximum likelihood tree for 47 species belonging to two orders (Fabales and Zygophyllales) based on the DNA sequences of 21 conserved mitochondrial PCGs (Figure 6). The mitogenome information of specific plant species can be found in Supplementary Table S6. The set of 21 shared PCGs were atp4, atp6, atp8, atp9, ccmB, ccmC, ccmFC, ccmFN, cob, cox1, cox3, matR, nad1, nad2, nad3, nad5, nad6, nad7, nad9, rpl5, and rps12. The result showed that F. philippinensis belongs to the family Fabaceae of the order Fabales, and has a relatively distant genetic relationship with plants of other genera in Fabaceae. F. philippinensis belongs to the subfamily Faboideae, and it falls into the same major clade as groups such as Vigna, Glycine, Phaseolus, and Psophocarpus. It is noteworthy that the two mitogenomes of Zygophyllales form an outgroup.
2.6. Colinearity Analysis
Colinearity analysis established via homologous gene alignment or sequence comparison can be used to explore the evolutionary relationships among species. We employed the BLASTn program to compare homologous genes and their sequence arrangements. Based on sequence similarity, the original program of MCscanX software was used to plot the multiple synteny plot between F. philippinensis and its closely related species. Detailed results are presented in Supplementary Table S7 and Figure 7, with syntenic blocks shorter than 0.5 kb excluded from the visualization for clarity. The results revealed that a large number of homologous syntenic blocks were detected between F. philippinensis and its closely related species. Notably, the mitogenomes of these eight species exhibited inconsistent syntenic block arrangement patterns. The mitogenome of F. philippinensis, along with those of Vigna radiata, Phaseolus vulgaris, Glycine max, Pueraria montana, Psophocarpus tetragonolobus, Apios americana, and Millettia pinnata, may have undergone extensive genomic rearrangements during the evolutionary process, leading to structural non-conservation of their mitochondrial genomes.
2.7. RNA-Editing Sites Prediction
In this study, we predicted a total of 498 potential RNA editing sites in the 33 PCGs of the F. philippinensis mitogenome with a cutoff value set at 0.9, all of which exhibited C-to-U transitions. Among these sites, 17 (3.4%) were identified as synonymous substitution sites, and the rest were non-synonymous substitution sites (Supplementary Table S8). Notably, the nad4 gene exhibited the highest number of RNA editing sites among all mitochondrial genes, with a total of 44 RNA editing sites identified, followed by the ccmB gene, with 33 RNA editing sites identified, while the atp1 gene has only one RNA coding event (Figure 8). Furthermore, four sites resulted in the generation of stop codons in atp6, ccmFC, cox2, and rps10 genes, except the atp6 site was the conversion from CAA to UAA, resulting in Glycine becoming a stop codon; the other three genes were all converted from Arginine to stop codons. And six sites resulted in the generation of start codons in nad1, nad4L, nad5, nad7, rps1, and rps10 genes, all conversions were from ACG to AUG, resulting in Threonine being converted into a start codon (Supplementary Table S8).
Significantly, we observed that RNA editing sites were mainly caused by amino acid changes at the first and second base positions of the codons, and most of them were single-position changes. Among them, the frequency of changes at the second base position is the highest, accounting for 59.4% of total RNA editing sites, followed by changes at the first base position, accounting for 30.7%, and changes in bases at both positions only accounting for 6.6%. These RNA editing sites lead to most of the amino acid changes being a transition from hydrophilic amino acids to hydrophobic amino acids, such as leucine (Leu) to phenylalanine (Phe), arginine (Arg) to tyrosine (Tyr), proline (Pro) to leucine (Leu), serine (Ser) to phenylalanine (Phe), serine (Ser) to leucine (Leu), alanine (Ala) to valine (Val), threonine (Thr) to methionine (Met), and proline (Pro) to phenylalanine (Phe). These transformations account for 78.5% of RNA editing sites, which contributed to improving the stability of protein (Supplementary Table S8).
3. Discussion
3.1. Sequencing, Assembly, and Annotation of the Mitogenome of F. philippinensis
With the continuous advancement and refinement of sequencing technologies, the complete genomes of Fabaceae species have been successfully assembled and annotated, laying a foundation for and providing novel directions for comprehensive research on Fabaceae. F. philippinensis is a valuable traditional Chinese medicinal herb with substantial economic value and broad development potential. The mitogenome plays a pivotal role in plant biomass accumulation and is indispensable for understanding, improving, and utilizing plants. Therefore, we sequenced the mitogenome of F. philippinensis and successfully achieved its complete assembly and annotation. In this study, we generated both Illumina short-reads and Oxford Nanopore Technologies (ONT) long-reads for mitogenome sequencing. A total of 11.99 Gb of ONT long-read data were produced, with an average read length of 12,429.7 bp and a mean sequencing coverage of the mitogenome at 79.86×. During genome assembly, the assembled sequences were subjected to BLAST searches against the NCBI nucleotide database, and potential nuclear DNA contaminants were manually removed. The final mitogenome was retained based on two criteria: the sequence was fully contiguous and formed a closed circular structure with no isolated dead nodes; The sequence showed homology to known mitochondrial DNA in BLAST searches or contained complete mitochondrial coding genes. Using this approach, we ensured that the final mitogenome contained all known core mitochondrial PCGs and was structurally assembled into a circular DNA molecule. However, the structure of plant genomes is dynamic [37]. Repetitive sequences, especially long repetitive sequences, can mediate genomic recombination, resulting in the formation of different conformations. A single circular molecule is insufficient to fully demonstrate the structure of the mitogenome of F. philippinensis. It may be reversibly divided into two single rings mediated by long repeat sequences (Supplementary Figure S1). In the later stage, the potential conformations of F. philippinensis can be verified by PCR amplification.
The total length of the F. philippinensis mitogenome is 427,353 bp, harboring 33 genes (Figure 1, Table 3). This is somewhat different from the previously reported sizes of mitogenomes in Fabaceae species. Specifically, the mitogenome length of Vicia has been reported to be approximately 400 kb [38], while that of Pueraria is about 456 kb [39] and Medicago is roughly 300 kb, and internal exchange of genetic information is known to maintain plant genome sizes within a certain range [40]. Repetitive sequences and unique sequences are recognized as the key factors contributing to variations in mitogenome size across different varieties [35].
Gene loss is a prevalent evolutionary event during the evolution of higher plants. For instance, mycoheterotrophic and parasitic plants exhibit contraction of multiple gene families, which may be associated with their heterotrophic lifestyle, and the degree of gene loss is positively correlated with the increased level of heterotrophy or parasitism [41]. During the evolutionary process of wild emmer wheat (Triticum turgidum ssp. dicoccoides), partial gene loss has occurred due to hybridization, polyploidization, domestication, mutation, and other factors; consequently, modern cultivated wheat does not retain all the genes of its ancestral progenitors [42]. All these phenomena are typical outcomes of plant evolutionary processes. Most PCGs are relatively conserved in the mitogenomes of different plant species, whereas ribosomal protein genes and succinate dehydrogenase genes show a relatively high mutation frequency. A total of 33 PCGs were identified in the mitogenome of F. philippinensis. Comparative analysis with other leguminous plants revealed that F. philippinensis has lost at least nine genes, including rpl2, rpl10, rps2, rps7, rps11, rps13, rps14, rps19 and sdh3 (Supplementary Table S9). This pattern of gene loss is inconsistent with that reported in Vigna reflexo-pilosa, V. angularis, and V. radiata, which have been documented to lose only seven genes (cox2, rpl2, rpl10, rps2, rps11, rps13, and sdh3) [38]. Meanwhile, previous studies have demonstrated that cox2 loss is exclusive to the Phaseolinae subtribe, while rpl2 has been lost in most Fabaceae plant species [38,43], and the gene loss pattern of F. philippinensis is consistent with this finding. In the future, the identification and genetic improvement of Fabaceae plants could potentially be realized by integrating agronomic traits, quantitative traits, and trait-associated functional genes.
3.2. Phylogenetic Analysis and Collinearity Analysis
Phylogenetic analysis revealed that F. philippinensis exhibits a distant taxonomic relationship with other Fabaceae species. It should be noted that the accuracy of the phylogenetic tree is closely correlated with the number of genomes and genes incorporated in the analysis. At present, the scarcity of reported mitogenome data for Flemingia and its related species in the NCBI database results in certain limitations in the phylogenetic analysis of this genus. Collinearity analysis indicated that the arrangement order of collinear blocks in the mitogenomes of eight closely related species differs substantially. Compared with the mitogenomes of other related species, that of F. philippinensis has undergone extensive genome rearrangements, suggesting that its mitogenome structure is not conserved. Therefore, additional sequencing of mitogenomes from Flemingia species and other Fabaceae taxa is required to comprehensively elucidate the evolutionary patterns and variation characteristics of this genome.
3.3. Analysis of Relative Codon Usage Patterns in Mitogenomes
The eukaryotic genome contains 64 codons. which carry crucial recognition and translational information in plants and are therefore of great significance in the context of gene mutations [33]. Codon usage bias is a common phenomenon in genes, which has been confirmed in various prokaryotic and eukaryotic organisms [30,44]. The preference for specific synonymous codons in different species or taxa plays a crucial role in shaping the genetic characteristics of these organisms [45,46] and is considered a product of biological evolution and selection. In this study, except for the start codon AUG and the tryptophan codon (UGG), the remaining 18 amino acids all exhibited a tendency to use codons ending with A/T bases (Supplementary Table S1). Additionally, with the exceptions of the stop codon (UGA), isoleucine (AUA), and leucine (CUA), the RSCU values of all NNT and NNA codons were greater than 1.0. This further indicated a strong preference for adenine (A) or thymine (T) at the third position of codons in the mitochondrial protein-coding genes of F. philippinensis. Consistent with the findings in Phaseolus vulgaris, this A/T preference at the third codon position is a highly prevalent phenomenon across all characterized mitogenomes [43], suggesting a certain degree of similarity in mitochondrial codon usage bias among different species. The codon usage pattern may also enhance the stability of mitochondrial transcripts and the efficiency of protein synthesis, thereby endowing plants with adaptive advantages under diverse conditions [47]. Exploring codon usage patterns helps deepen our understanding of the role of plant mitogenomes in plant evolution and provides a reference for future research on the optimization of mitochondrial gene expression and its impacts on plant growth and metabolite accumulation.
3.4. RNA Editing Sites
RNA editing is a post-transcriptional mechanism in organelles of higher plants and helps improve protein folding [31,48]. RNA editing sites mainly include nucleotide insertion, deletion or substitution, etc., and mainly germinate in coding regions [49,50]. It should be noted that since Benne et al. [51] discovered RNA editing in 1986, RNA editing sites have been identified in animals, plants, and some viruses, with most organisms exhibiting variable preferred types [48]. In this study, all RNA editing sites are C/U editing (Figure 8 and Supplementary Table S8), which is consistent with the extensive occurrence of Cytidine (C)-to-Uridine (U) RNA editing in angiosperm mitochondria [52,53]. RNA editing is involved in male sterility, seed development, pathogen resistance, etc. [31,54,55] and plays a key role in transcriptional expression. RNA editing is a key regulatory factor of gene expression and has the potential to affect phenotypes [56].
The number of RNA editing sites in genes varies considerably across plant species. Genes with a large number of RNA editing sites may reflect the characteristics of plants under positive selection, as well as the critical roles of these genes in mitochondrial energy metabolism and the adaptive evolutionary demands of plants [57]. In this study, a total of 498 potential RNA editing sites were identified across 33 PCGs. Among the RNA editing sites of F. philippinensis, the substitution frequency at single sites of the second codon base was the highest, which is consistent with previous reports [32,58]. Among the RNA editing sites of F. philippinensis, the nad4 gene contained the largest number of editing sites. The nad4 gene, together with eight other mitochondrial nad genes (nad1, nad2, nad3, nad4L, nad5, nad6, nad7 and nad9), encodes components of Complex I, which is the largest respiratory complex found in the mitochondria of nearly all terrestrial plants [59]. The high frequency of editing sites in nad genes may be crucial for the function of Complex I. Furthermore, we found that four genes (atp6, ccmFC, cox2 and rps10) generated stop codons via RNA editing, thereby altering their open reading frames (ORFs). Among these genes, the cox gene family is relatively conserved during evolution and is widely recognized as encoding key enzymes involved in respiration and biological processes [60]. The editing events of cox2 may serve to modify and refine genetic information, increase the diversity of gene products, and thus contribute to the evolutionary adaptation of F. philippinensis.
4. Materials and Methods
4.1. Plant Materials and Sequencing
Fresh plant samples were collected from the Agricultural Science New Town of Guangxi University, Fusui, Guangxi Zhuang Autonomous Region, China (22°38′ N, 107°54′ E), which were donated by the Guangxi Medicinal Botanical Garden. DNA was extracted from the leaves using the Tiangen new plant genomic DNA extraction kit (DNAsecure Plant Kit (Tiangen Biotech, Beijing, Co., Ltd., Beijing, China). Agarose gel electrophoresis and the Nanodrop2000 instrument (Thermo Fisher Scientific, Wilmington, DE, USA) were used to evaluate the integrity and concentration of DNA. The mitogenome of F. philippinensis was sequenced using the Illumina and Nanopore methods.
Qualified genomic DNA samples were subjected to next-generation sequencing (NGS) and Oxford Nanopore sequencing on the DNBSEQ-G400 and the PromethION platform (Oxford Nanopore Technologies, Oxford, UK), respectively.
4.2. Assembly and Annotation of F. philippinensis
The mitogenome of F. philippinensis was assembled based on all sequenced long-reads data. The Flye software (v2.9.2-b1786) [61] was used to assemble the long-reads sequencing data with the default parameters, yielding an initial file in GFA format containing all assembly results.
For all assembled contigs in FASTA format, we constructed a database using makeblastdb. Subsequently, the BLASTn (v2.13.0) program was employed with mitochondrial genes from Arabidopsis thaliana as query sequences to identify potential mitochondrial contig fragments containing mitogenome sequences. The parameters used were “-evalue 1 × 10^−10^ -outfmt 6 -max_hsps 10 -task blastn-short” (note: corrected the typo “taskblastn-short” to “task blastn-short” for parameter syntax accuracy). Only the contig fragments containing mitochondrial genes were retained. The GFA file of the retained mitochondrial fragments was visualized using Bandage software (v0.8.1) [62]. Subsequently, long-reads and short-reads data were aligned to the mitochondrial contigs using bwa software (v0.7.17) [63], and the mitochondrial reads that were aligned were used for subsequent hybrid assembly. Finally, the short-reads and long-reads sequencing data were hybrid assembled using Unicycler software (v0.4.8) [64] with the following command line: “unicycler -1 filtered_short_reads.R1.fastq -2 filtered_short_reads.R2.fastq -l filtered_long_reads.fasta --kmers 57,67,77,89”. Here, “filtered_short_reads.R1.fastq” and “filtered_short_reads.R2.fastq” are the filtered paired-end short-read data that only mapped to mitochondrial sequences, while “filtered_long_reads.fasta” is the filtered long-read data that mapped to mitochondrial sequences. Finally, we obtained the mitogenome of F. philippinensis and visualized by Bandage software (v0.8.1).
Arabidopsis thaliana (NC_037304) and Liriodendron tulipifera (NC_021152.1) were selected as reference genomes to annotate the protein-coding genes of the mitogenome of F. philippinensis via Plant Mitochondrial Genome Annotator (PMGA; http://www.1kmpg.cn/pmga/, accessed on 24 February 2026) [65]. The tRNAs of the mitogenome were annotated using the tRNAscan-SE software (v.2.0.11) [66], while the rRNAs were annotated using the BLASTN software (v2.13.0) [67]. Every annotation error in the mitogenome was manually corrected and corrected using the Apollo software (v1.11.8) [68] and was visualized using the OGDRAW software (v1.3.1) [69].
4.3. Analysis of Relative Synonymous Codon Usage (RSCU) and Repeated Sequences
We used the software program Phylosuite (v1.1.16) [70] to extract the protein-coding sequences and MEGA (v12.1.0) [71] to analyze the codon usage bias of the protein-coding genes of the mitogenome and calculate the RSCU value.
Microsatellite sequence repeats, tandem repeats and interspersed repeats were identified using MISA (v2.1) (https://webblast.ipk-gatersleben.de/misa/, accessed on 24 February 2026) [72], TRF (v4.09) (https://tandem.bu.edu/trf/trf.unix.help.html, accessed on 24 February 2026) [73], and the REPuter web server (https://bibiserv.cebitec.uni-bielefeld.de/reputer/, accessed on 24 February 2026) [69], respectively. The results were visualized using Excel (2021) software and the Circos package (v0.69.9) [74].
4.4. Phylogenetic Analysis
We selected a total of 47 species from Fabales and Zygophyllales for phylogenetic analysis. The complete mitogenomes of 47 species were downloaded from the NCBI database. The software program PhyloSuite (v1.1.16) [70] was used to extract common genes and the MAFFT software (v7.505) [75] was used for sequence alignment for each gene. After sequence alignment, each gene sequence was concatenated. Subsequently, phylogenetic analysis was conducted based on the maximum likelihood method (ML) using IQ-TREE2 (v 2.3.6) [76] with the Combine ModelFinder, tree search, ultrafast bootstrap, and SH-aLRT test modes to construct the tree. The optimal model finally selected was GTR + F + R4. Finally, the ITOL software (v6) [77] was used to visualize the results of phylogenetic analysis.
4.5. Collinearity Analysis
We used the BLASTn program to identify conserved homologous sequences with the parameters “- value 1 × 10^−5^, -word_size 9, -gapopen 5, -gapextend 2, -reward 2, -penalty -3” and only those longer than 500 bp were selected for the next step of analysis. Then, we used MCscanX [78] to generate a multiple synteny plot.
4.6. Sequence Transfer Analysis and RNA Editing Site Prediction
Illumina data were used for chloroplast genome assembly with the software GetOrganelle (v1.7.7.0) [79]; the command line is “get_organelle_from_reads.py -1 R1.fq.gz -2 R2.fq.gz -o plastome_output -R 15 -k 21,45,65,85,105 -F embplant_pt -t 30”, and annotated the chloroplast genome using CPGAVAS2 [80], then corrected the annotation results of the chloroplast genome using the CPGView software (v1) [81]. After that, we analyzed the homologous fragments using the BLASTN software (v2.13.0) [67] with the parameters: “-evalue 1 × 10^−5^ -outfmt 6 -perc_identity 80”, and visualized the results using the Circos (v0.69.9) [74].
We input the sequences of all PCGs encoded by the mitogenome into Deepred-mt [82] to predict the C-to-U RNA editing sites. All results with a probability value greater than 0.9 are retained.
5. Conclusions
In this study, we sequenced and assembled the complete mitogenome of F. philippinensis, and elucidated its fundamental characteristics via comprehensive analyses including codon usage bias analysis, phylogenetic analysis, identification, and characterization of repetitive sequences, prediction of RNA editing sites, and investigation of gene transfer fragments. This mitogenome has a total length of 427,353 bp with a GC content of 44.90%, and a total of 33 PCGs were annotated. RNA editing sites were identified in all PCGs, with a total of 498 sites all exhibiting C-to-U transitions. The nad4 gene harbored the largest number of RNA editing sites, implying its potential unique role in the regulation of mitochondrial functions in F. philippinensis. An analysis of repetitive sequences revealed that dispersed repetitive sequences (87%) play a dominant role in genome evolution. Preferentially used high-frequency codons were predominantly terminated with A/T, which reflects their co-adaptation to the high AT content of the mitogenome. The sequencing and annotation of the F. philippinensis mitogenome provide valuable insights into the adaptive evolution and molecular investigations of its mitochondria and lay a solid theoretical foundation for further research on the evolution and phylogeny of Fabaceae plants.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Editorial Committee of Flora of China Chinese Academy of Sciences Flora of China 41Science Press Beijing, China 1995313
- 2Niu S.L. Tong Z.F. Lv T.M. Wu J. Yu Y. Tian J.L. Song X.R. Wang Q.Y. Zhang X.Y. Hu P. Prenylated isoflavones from the roots of Flemingia philippinensis as potential inhibitors of β-amyloid aggregation Fitoterapia 202115510506010.1016/j.fitote.2021.10506034637885 · doi ↗ · pubmed ↗
- 3Wang Y. Kim J.Y. Song Y.H. Li Z.P. Yoon S.H. Uddin Z. Ban Y.J. Lee K.W. Park K.H. Highly potent bacterial neuraminidase inhibitors, chromenone derivatives from Flemingia philippinensis Int. J. Biol. Macromol.201912814915710.1016/j.ijbiomac.2019.01.10530682484 · doi ↗ · pubmed ↗
- 4Tjahjandarie T.S. Tanjung M. Saputri R.D. Aldin M.F. Susanti R.A. Pertiwi N.P. Wibawa R.S. Halizah I.N. Cytotoxicity evaluation of two new chalcones from the leaves of Flemingia macrophylla (Willd.)Merr. Phytochem. Lett.202144788110.1016/j.phytol.2021.06.006 · doi ↗
- 5Zhang Y.Z. Li Y.L. Jin Z. He M.F. Tang Y.M. Zhang J. Zhang B. Fabrication of molecularly imprinted polymers based on magnetic covalent organic framework for highly selective and sensitive analysis of genistein in F. Philippinensis Microchem. J.202419911001310.1016/j.microc.2024.110013 · doi ↗
- 6Zhang S. Wang J. He W.C. Kan S.L. Liao X.Z. Jordan D.R. Mace E.S. Tao Y.F. Cruickshank A.W. Klein R. Variation in mitogenome structural conformation in wild and cultivated lineages of sorghum corresponds with domestication history and plastome evolution BMC Plant Biol.2023239110.1186/s 12870-023-04104-236782130 PMC 9926791 · doi ↗ · pubmed ↗
- 7Liberatore K.L. Dukowic-Schulze S. Miller M.E. Chen C. Kianian S.F. The role of mitochondria in plant development and stress tolerance Free Radic. Biol. Med.201610023825610.1016/j.freeradbiomed.2016.03.03327036362 · doi ↗ · pubmed ↗
- 8Shen J.S. Li X.Q. Li M.Z. Cheng H.F. Huang X.L. Jin S.H. Characterization, comparative phylogenetic, and gene transfer analyses of organelle genomes of Rhododendron × pulchrum Front Plant Sci.20221396976510.3389/fpls.2022.96976536212362 PMC 9532937 · doi ↗ · pubmed ↗
