Structural characterization of the plastid genome and comparative mitochondrial genomics of Eupatorium lindleyanum (Asteraceae): evolutionary dynamics and phylogenetic insights
Yan Li, Shilong Chen, Faqi Zhang, Mingze Xia

TL;DR
This study provides the first complete mitochondrial and plastid genome sequences of Eupatorium lindleyanum, offering insights into its evolution and phylogeny.
Contribution
The first complete mitochondrial and plastid genome assembly for Eupatorium lindleyanum, revealing evolutionary dynamics and phylogenetic utility.
Findings
The mitochondrial genome is 299,285 bp with 32 protein-coding genes and 504 RNA editing sites.
Comparative analysis shows significant structural variation in Eupatorium mitochondrial genomes.
Phylogenetic reconstruction using mitochondrial data complements traditional plastid-based approaches.
Abstract
Eupatorium lindleyanum is a widely utilized traditional medicinal plant with diverse pharmacological activities, extensive distribution, and abundant resources, making it a promising candidate for industrial and scientific research. However, limited genomic resources and unclear genetic relationships have hindered its development. To date, molecular studies on E. lindleyanum are scarce, and no detailed mitochondrial genome analyses have been reported for congeneric species. In this study, we present the first complete assembly and annotation of the mitochondrial and plastid genomes of E. lindleyanum, investigating their potential for phylogenetic reconstruction within Asteraceae. Using Illumina NovaSeq and Nanopore PromethION platforms, 13.4 Gb and 12 Gb of raw sequencing data were generated, respectively. The mitochondrial genome of E. lindleyanum is a 299,285 bp circular DNA molecule…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9- —Natural Science Foundation of Shandong Province
- —Doctoral Research Initiation Fund of Shandong Second Medical University
- —Qinghai Provincial Science and Technology Major Project
- —Xining Science and Technology Major Project
- —CAS - Qinghai on Sanjiangyuan National Park
- —CAS “Light of West China” Program (2024)
- —leading talents of the Kunlun talents in Qinghai Province
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Photosynthetic Processes and Mechanisms · RNA and protein synthesis mechanisms
Background
Mitochondria and plastid are two essential organelles in plant cells, each possessing distinct genetic material and functioning in coordination with the nuclear genome to regulate vital cellular physiological processes [1, 2]. Both organelles are semi-autonomous and have evolved independently of the nuclear genome over the course of long-term plant evolution [3]. Mitochondria, as vital organelles in plant cells, play a key role in maintaining cellular physiological and metabolic processes [4]. They serve as energy production centers, synthesizing adenosine triphosphate (ATP) via the tricarboxylic acid cycle and oxidative phosphorylation, thus supplying energy required for cellular activities [5]. In addition to energy production, mitochondria are involved in regulating essential processes such as cell growth, differentiation, apoptosis, and intercellular signaling [6–8]. The plant mitochondrial genome has undergone rapid evolution since its endosymbiotic origin, resulting in sizes and structural complexities far exceeding those of animal mitochondrial genomes [9]. The size of plant mitochondrial genomes varies extensively, ranging from tens of kilobase pairs to several mega base pairs, primarily attributed to frequent recombination of repetitive sequences and the incorporation of foreign DNA. [10, 11]. Despite significant variation in gene numbers among different plant species, functional genes exhibit relative conservation [12, 13]. Structurally, plant mitochondrial genomes exhibit significant variability, particularly in gene order and genome architecture, making them more complex compared to plastid and nuclear genomes [14–16]. Although the plant mitochondrial genome is an invaluable resource for investigating species origins, genetic diversity, and phylogenetics, its complex structure poses significant challenges for genome assembly [17]. While numerous plant plastid genomes have been sequenced, research on plant mitochondrial genomes remains limited. To date, only a small number of plant mitochondrial genomes have been fully sequenced and published [18]. Significant differences in genome structure and content, nucleotide substitution rates, and levels of repetitive recombination are evident among plant mitochondrial genomes. These variations are observed not only between species but also within the same species, presenting a stark contrast to the high conservation characteristic of plastid genomes. Therefore, the study of plant mitochondrial genomes provides a wealth of genetic information that can be utilized for evolutionary analysis, species differentiation, and research in plant phylogenetics.
Eupatorium lindleyanum DC. is a perennial herb belonging to the genus Eupatorium in the family Asteraceae. Several species in this genus are widely used in traditional Chinese medicine, such as E. chinense L., E. fortunei Turcz., E. heterophyllum DC., E. japonicum Thunb., and E. lindleyanum. E. lindleyanum (also called Yemazhui in Chinese medicine) is noted for its pharmacological properties, including blood lipid regulation and anti-atherosclerotic effects, and is recorded in the Pharmacopoeia of the People's Republic of China [19]. This species is widely distributed across Russia, Korea, Japan, and China, with abundant natural resources [20]. However, current research on E. lindleyanum has primarily focused on its chemical composition and pharmacological activities. The lack of genomic resources and unclear genetic relationships has significantly hindered its industrial development over the years.
The Asteraceae is one of the largest families of angiosperms, comprising approximately 24,000 to 35,000 species, which account for about 10% of all angiosperm species. Many of these species possess significant ornamental and medicinal value [21]. Although plastid genomes from 802 Asteraceae species have been published in the National Center for Biotechnology Information (NCBI) database, mitochondrial genomes have only been reported for 44 species as of December 5, 2024. This stark discrepancy underscores the limited availability of mitochondrial genome data for such a diverse family, potentially hindering comprehensive insights into mitochondrial genome evolution within Asteraceae. For the genus Eupatorium, the complete plastid genome of E. chinense has been successfully assembled (GenBank accession number: NC_082294). However, there are no publications reporting its mitochondrial genome. Given the rich medicinal resources of the genus Eupatorium, there is an urgent need to expand the mitochondrial genome database for these species to support further research and applications.
In this study, we assembled and annotated the mitochondrial genome of E. lindleyanum using Illumina and Nanopore sequencing data. A comprehensive analysis was conducted to investigate its features, repeat sequences, RNA editing sites, and codon usage preferences. Comparative genomic studies were also performed to examine the structural characteristics of mitochondrial genomes within the Asteraceae family. Additionally, we also assembled the plastid genome and identified homologous fragments shared between its mitochondrial and plastid genomes. The objectives of this study are as follows: 1. To assemble and annotate the mitochondrial and plastid genomes of E. lindleyanum; 2. To analyze structural variations, gene content differences, and evolutionary patterns of the mitochondrial genome in comparison with closely related species; 3. To investigate the utility of mitochondrial genome data in reconstructing genus-level phylogenetic relationships within the Asteraceae family.
Materials and methods
Plant materials, DNA isolation and genome sequencing
Fresh leaves of E. lindleyanum were collected from Xinyang City, China (N 31.82°, E 114.05°). The species was identified by Dr. Mingze Xia, and the voucher specimen (sampling number: Xia2023019) is deposited in the Medicinal Plant Herbarium of the School of Pharmacy, Shandong Second Medical University. High-quality DNA was extracted from the fresh leaves using a modified cetyltrimethylammonium bromide (CTAB) method [22]. DNA quality was assessed using 0.75% agarose gel electrophoresis and a NanoDrop One spectrophotometer (Thermo Fisher Scientific). Qualified DNA samples were sequenced by Novogene Co., Ltd., using the NovaSeq 6000 platform for short-read sequencing and the Nanopore PromethION platform (Oxford Nanopore Technologies, UK) for long-read sequencing. Sequencing was performed following the Illumina NovaSeq 6000 platform protocol and the Nanopore PromethION library preparation protocol. Raw reads were filtered using Fastp v0.21.0 [23] for short reads and NanoFilt v2.8.0 [24] for long reads.
Assembly and annotation of the mitogenome and plastome
The mitochondrial genome of E. lindleyanum was assembled de novo using Flye [25] with default parameters, based on long-read sequencing data. Assembly processes were visualized with Bandage v0.8.1 [26]. Using the complete mitochondrial genome of E. chinense (NC_082294) as reference, BLASTn v2.13.0 was employed to filter E. lindleyanum mitochondrial sequences with the parameters: -evalue 1e-5 -outfmt 6 -max_hsp 10 -word_size 7 -task blastn-short [27]. Contigs were identified from the filtering results to construct a draft mitochondrial genome. Subsequently, short- and long-read data were mapped to the draft mitochondrial genome contigs using BWA V0.1.17 [28]. Aligned sequences were filtered and exported for subsequent hybrid assembly. The final assembly of the E. lindleyanum mitochondrial genome was achieved using Unicycler v0.4.7 with default parameters (–kmers 21, 45, 65, 89), integrating short- and long-read data [29].
Despite the structural diversity of plant mitochondrial genomes, their genes exhibit conservation. The mitochondrial genome of E. chinense (NC_082294) was used as a reference for protein-coding gene annotation, performed with GeSeq [30]. Additionally, the mitochondrial genomes of Arabidopsis thaliana (NC_037304) and Liriodendron tulipifera (NC_021152.1) were employed as references for auxiliary annotations. Annotations were further refined using the Intelligent Plant Mitochondrial Genome Annotator (IPMGA, http://www.1kmpg.cn/ipmga/) to improve the accuracy of certain genes. The tRNA genes were annotated using tRNAscan-SE v2.0.11 [31], while the rRNA genes were identified and annotated using BLASTn v2.13.0 [27]. Gene annotations were manually curated using Apollo v1.11.8 to correct errors [32]. The final mitochondrial genome map of E. lindleyanum was created using Organellar Genome DRAW (OGDRAW) [33].
The plastid genome was assembled using GetOrganelle v1.7.7 with specific parameters (-k = 21, 45, 65, 85, 105; -F = embplant_pt) with filtered short reads [34]. Results were visualized with Bandage v0.8.1 [26]. The plastid genomes of E. fortunei (OK545755) and E. chinense (NC_072212) were used as references for initially annotation, performed via GeSeq [30]. The precision of start and stop codons and gene locations were enhanced through manual adjustments anchored in the reference plastid genome sequence. The final plastid genome map of E. lindleyanum was created using Organellar Genome DRAW (OGDRAW) [33].
To evaluate assembly completeness, third-generation sequencing reads were aligned to both mitochondrial and plastid genomes using BWA v0.1.17 with default parameters [28]. Genome-wide coverage depth was subsequently calculated using SAMtools v1.15 (depth -a command) to ensure base-level resolution [35].
Analysis of codon usage bias, repeat fragments and prediction of RNA editing sites
Based on the annotated mitochondrial and plastid genomes of E. lindleyanum, protein-coding genes (PCGs) were extracted using PhyloSuite v1.2.3 [36]. We identified 32 unique mitochondrial PCGs (28,983 bp total length) and 80 plastid PCGs (63,630 bp total length) in E. lindleyanum. Codon usage and relative synonymous codon usage (RSCU) values were analyzed using MEGA v11.0.13 [37].
Simple sequence repeats (SSRs), tandem repeats, and dispersed repeats of the E. lindleyanum mitochondrial and plastid genome were identified using the online tool MISA (https://webblast.ipk-gatersleben.de/misa/) [38], Tandem Repeats Finder v4.09 (https://tandem.bu.edu/trf/trf.unix.help.html) [39], and REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer/) [40], respectively. For MISA, the minimum repeat unit thresholds for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides were set at 10, 5, 4, 3, 3, and 3, respectively. The detected results were visualized using Excel software.
RNA editing sites (C-to-U) in mitochondrial PCGs were predicted using the convolutional neural network (CNN)-based tool Deepred-mt [41]. Predicted sites with a probability score greater than 0.9 (cutoff = 0.9) were retained.
Identification of mitochondrial plastid sequences
To detect plastid fragment transfers in the mitochondrial genome of E. lindleyanum, homologous regions between the plastid and mitochondrial genomes were identified using BLASTn v2.13.0 (-evalue 1e-5 -outfmt 6 -perc_identity 80) [28]. The results were visualized using the Circos package v0.69.9 [42].
Construction of maximum likelihood tree based on the PCGs
To assess the potential of mitochondrial genome data for reconstructing phylogenetic relationships at the family and higher taxonomic levels, we constructed a phylogenetic tree based on mitochondrial genomes from 19 Campanulids and two outgroup species (Additional file 1). Additionally, to investigate the capability of mitochondrial data for resolving phylogenetic relationships at the genus level, we retrieved the complete mitochondrial and plastid genome sequences of 35 Asterales species from NCBI (Additional file 1). Two species from Lamiaceae (Scutellaria barbata and Salvia miltiorrhiza) and two species from Campanulaceae (Codonopsis lanceolata and Platycodon grandiflorus) were used as outgroups. These species, along with newly sequenced E. lindleyanum, were included in the phylogenetic analysis. The shared protein-coding genes in mitochondrial and plastid genomes were extracted using PhyloSuite V.1.2.2 [36] (Additional file 2). Multiple sequence alignment was performed with MAFFT V.7.409 [43]. Phylogenetic trees were constructed using the maximum likelihood method implemented in IQ-TREE V.1.6.12 with the parameters –alrt 1000 -bb 5000 [44]. The best-fit substitution models (mitochondrial: GTR + F + R2; plastid: GY + F + R3) were determined by the Bayesian Information Criterion. Final tree topologies were visualized and refined using online tool ITOL [45].
Comparison analyses of mitogenomes
To assess the mitochondrial genome similarity and collinearity of E. lindleyanum with other Asteraceae species, ten species were selected for collinearity analysis based on the phylogenetic reconstruction results. Conserved homologous sequences in the mitochondrial genomes were identified using BLASTn v2.13.0 with the parameters: -evalue 1e-5, -word_size 9, -gapopen 5, -gapextend 2, -reward 2, and -penalty 3 [28]. Homologous sequences with lengths ≥ 500 bp were retained, and multiple collinearity maps were generated using the core program of MCScanX [46]. Additionally, to visually represent sequence variation between the mitochondrial genome of E. lindleyanum and those of other species, a dot plot was generated using the online tool D-GENIES with the E. lindleyanum mitochondrial genome as the reference (default parameters) [47].
Results
Assembly of E. lindleyanum organellar genomes
We performed Illumina NovaSeq and Nanopore PromethION sequencing on the total DNA of E. lindleyanum, generating 13.4G and 12G raw data, respectively (uploaded to the NCBI SRA database, BioProject ID: PRJNA1195372). The mitochondrial genome assembly based on long-read data was visualized using Bandage, revealing that the E. lindleyanum mitochondrial genome comprises 12 nodes (Fig. 1A, Additional file 3). These nodes form a complex, multi-branched closed genome structure, representing the complete mitochondrial genome sequence of this species. To resolve critical branching nodes, we extracted the sequences from the branching regions and mapped them to long-reads. When two sequences at a branching node were found connected within the same long-read in a head-to-tail manner, it confirmed their linkage. After resolving the branching nodes using long-read data, we obtained a circular DNA structure as the final mitochondrial genome (Fig. 1B, Additional file 4). The mitochondrial genome coverage of 100% was achieved with a mean depth of 185.27 × (Additional file 5).Fig. 1mitochondrial genome of Eupatorium lindleyanum displayed in Bandage*.* Topological structures of the mitochondrial contigs (A) and circular structure of the mitochondrial genome (B)
The complete mitochondrial genome of E. lindleyanum is 299,285 bp in length with a GC content of 45.06%. It contains 32 unique PCGs, 17 tRNA genes (5 of which are multicopy), and 3 rRNA genes (2 of which are multicopy) (Fig. 2A, Table 1). The PCGs include 24 core mitochondrial genes and 8 non-core genes. The 24 core genes are: five ATP synthase genes (atp1, atp4, atp6, atp8, atp9), nine NADH dehydrogenase genes (nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9), one cytochrome b gene (cob), four cytochrome c biogenesis genes (ccmB, ccmC, ccmFC, ccmFN), three cytochrome c oxidase genes (cox1, cox2, cox3), one maturase gene (matR), and one membrane transport protein gene (mttB). The 8 non-core genes are: three Ribosomal protein large subunit genes (rpl5, rpl10, rpl16), four Ribosomal protein small subunit genes (rps3, rps4, rps12, rps13), and one succinate dehydrogenase gene (sdh4). The three rRNA genes are rrn5, rrn18, and rrn26. Among these genes, 8 contain introns (nad1, nad2, nad4, nad5, nad7, ccmFC, rps3, cox2). The E. lindleyanum mitochondrial genome has been archived in GenBank (accession number: PQ157699).Fig. 2. Circular map of the mitochondrial genome (A) and plastid genome (B) of Eupatorium lindleyanum. Genomic features transcribed in a clockwise direction are shown on the inner circle, while those transcribed counterclockwise are depicted on the outer circle. Genes are color-coded according to their functional categories. The GC content is represented on the inner circle by a dark gray plotTable 1Gene composition in the Eupatorium lindleyanum mitogenomeGroup of genesName of genesATP synthaseatp1*, atp4, atp6, atp8, atp9NADH dehydrogenasenad1, nad2, nad3 (× 2), nad4, nad4L, nad5, nad6, nad7, nad9Cytochrome b**cobCytochrome c biogenesisccmB, ccmC, ccmFC, ccmFNCytochrome c oxidasecox1, cox2, cox3MaturasesmatRProtein transport subunitmttBRibosomal protein large subunitrpl5, rpl10, rpl16Ribosomal protein small subunitrps3, rps4, rps12, rps13Succinate dehydrogenasesdh4Ribosome RNArrn5(× 2), rrn18(× 2), rrn26Transfer RNAtrnC-GCA (× 2), trnD-GUC (× 2), trnE-UUC, trnF-GAA, trnfM-CAU, trnG-GCC, trnH-GUG, trnI-CAU (× 2), trnK-UUU, trnM-CAU, trnN-GUU(× 2), trnP-UGG, trnQ-UUG(× 2), trnS-GCU, trnT-UGU, trnW-CCA, trn*Y-GUANote: The number in brackets represents the copy number of the gene
Additionally, we assembled the plastid genome of E. lindleyanum using Illumina NovaSeq sequencing data. The plastid genome exhibits a typical quadripartite structure, consisting of two inverted repeat regions (IRA, IRB), a large single-copy region (LSC), and a small single-copy region (SSC) (Fig. 2B). The plastid genome coverage of 100% was achieved with a mean depth of 1,694.41 × (Additional file 6).
The total length of the plastid genome is 151,377 bp, with a GC content of 37.60%. The LSC region spans 83,236 bp with a GC content of 35.68%, the two IR regions are each 24,922 bp with a GC content of 43.14%, and the SSC region is 18,297 bp with a GC content of 31.26%. The plastid genome contains 80 unique PCGs (8 of which are multicopy), 28 tRNA genes (8 of which are multicopy), and 4 rRNA genes (all multicopy) (Additional file 7). A total of 22 genes contains introns. The E. lindleyanum plastid genome has been archived in GenBank (accession number: PQ157700).
Analysis of codon usage bias
Codon usage bias, shaped by long-term evolutionary selection, serves as a critical indicator of translational efficiency and genomic adaptation. To investigate this phenomenon in E. lindleyanum, we calculated the RSCU values for 32 protein-coding genes (PCGs) across both mitochondrial and plastid genomes (Fig. 3; Additional file 8). RSCU values greater than 1 indicate higher usage frequency compared to other synonymous codons, while RSCU values equal to 1 represent unbiased usage, and values less than 1 reflect lower usage frequency than other synonymous codons [48]. Both organellar genomes exhibited pronounced codon usage bias, with the exception of the start codon (AUG/Met) and tryptophan (UGG/Trp) codons (RSCU = 1.0), which are evolutionarily constrained by their singular coding roles. Among the mitochondrial codons, alanine (Ala) codon GCU exhibited the highest preference, with an RSCU value of 1.64, the highest in the mitochondrial PCGs. Additionally, high RSCU values were observed for the stop codon UAA (1.5), glutamine (Gln) codon CAA (1.51), histidine (His) codon CAU (1.55), and tyrosine (Tyr) codon UAU (1.52). Plastid codons demonstrated enhanced bias magnitudes, exemplified by leucine (Leu) codon UUA (RSCU = 1.91), serine (Ser) codon UCU (1.83), alanine (Ala) codon GCU (1.79) and arginine (Arg) codon AGA (1.75). Furthermore, among the mitochondrial codons, 87.5% of codons ending with A or T (U) had RSCU values greater than 1.0, whereas 93.75% of codons ending with C or G had RSCU values less than 1.0. This suggests a general bias in codon usage toward A or T (U) at the third codon position in the mitochondrial genome of E. lindleyanum, consistent with findings from related studies [19, 49].Fig. 3. Relative synonymous codon usage (RSCU) in the Eupatorium lindleyanum mitogenome and plastome. The different amino acids are represented on the x-axis
Repeat fragments and prediction of RNA editing sites
The types of repetitive sequences in plant mitochondrial genomes include tandem repeats and dispersed repeats. Simple sequence repeats (SSRs), a specific type of tandem repeat, typically consist of repeating units no longer than 6 bp. In E. lindleyanum, we identified 73 mitochondrial and 30 plastid SSRs, revealing stark organellar divergence (Fig. 4A, B, Additional file 9). Among the mitochondrial SSRs, tetranucleotide repeats were the most abundant (32, 43.84%), followed by dinucleotide repeats (14, 19.18%), trinucleotide repeats (13, 17.81%), and mononucleotide repeats (9, 12.33%). Thymine (T) mononucleotide repeats were the most frequent, accounting for 66.67% of all mononucleotide SSRs. Pentanucleotide and hexanucleotide repeats were rare, with 2 and 3 occurrences, respectively (Fig. 4A). Plastid SSRs contrasted sharply, with mononucleotide repeats prevailing (22, 73.3%), while trinucleotide and hexanucleotide types each occurred once, and no pentanucleotide SSRs were detected. (Fig. 4B).Fig. 4. Type and number of SSRs and repeats in Eupatorium lindleyanum mitochondrial (A, C) and plastid (B, D) genome
Additionally, 31 tandem repeats with sequence identity > 80% and lengths ranging from 6 to 71 bp were detected in mitochondrial genome (Fig. 4C, Additional file 10). Most of these repeats were located in intergenic regions, while a few were located within coding sequences or pseudogenes. All 31 tandem repeats in the mitochondrial genome of E. lindleyanum were located in intergenic regions. Plastid repeats, however, invaded coding regions, with tandem repeats embedded within rps18, ycf2, and ycf1 genes—a pattern absent in mitochondria (Fig. 4D; Additional file 10). Notably, in mitochondrial genome, four tandem repeats were identified in the intergenic region between the trnQ-UUG and trnK-UUU genes, as well as between the cox1 and mttB genes.
Dispersed repetitive elements, defined as non-adjacent repeats ≥ 30 bp, exhibited marked organellar divergence in E. lindleyanum. The mitochondrial genome harbored 274 dispersed repeat pairs, comprising 117 palindromic and 157 forward repeats, with reverse and complementary types conspicuously absent (Fig. 4C). Notably, 89.8% (246/274) of mitochondrial repeats measured < 100 bp (Additional file 11). Extreme-length exceptions included a 2,708 bp palindromic repeat spanning the atp4-atp6 intergenic region and a 6,509 bp forward repeat encompassing nad3, nad5, rrn5, and rrn18—potentially mediating large-scale genomic rearrangements. Plastid counterparts demonstrated reduced repeat proliferation, with only 15 palindromic and 12 forward repeats identified. These plastid repeats displayed even stronger size constraints, as 93% (25/27) measured < 50 bp (Fig. 4D, Additional file 11).
To further investigate gene expression, RNA editing events in the 32 PCGs of the mitochondrial genome were predicted. A total of 504 potential RNA editing sites were identified, all involving C-to-U transitions (Additional file 12). These events occurred exclusively at the first or second nucleotide positions, with a preference for the second position. The genes with the highest numbers of RNA editing sites were ccmB and mttB (37 sites each), followed by nad7 (35 sites, Fig. 5A, Additional file 12). Among the 504 RNA editing events, 27 (5.36%) resulted in synonymous codon changes, involving nine amino acids, including phenylalanine (6 occurrences) and valine (4 occurrences). The majority of events resulted in nonsynonymous changes, involving 12 amino acid substitutions, with the most frequent being serine to leucine (111 occurrences) and proline to leucine (111 occurrences) (Fig. 5B).Fig. 5. Number of RNA editing sites identified in protein-coding genes (A) and changes in translation products of genes before and after RNA editing (B) of Eupatorium lindleyanum mitochondrial genome
Significant RNA editing events were predicted for the start codons of cox1 and nad4L, where ACG (threonine) was converted to AUG (methionine). Similarly, the stop codons of atp9 and ccmFC were altered from CGA (arginine) to UGA (stop codon).
Mitochondrial plastid transferred fragments
Mitochondrial plastid transferred fragments (MTPTs) are plastid-derived sequences present in mitochondrial genomes. Sequence similarity analysis revealed 15 homologous fragments shared between the mitochondrial and plastid genomes of E. lindleyanum, with a total length of 11,245 bp, accounting for 3.76% of the mitochondrial genome (Fig. 6). The lengths of these fragments ranged from 30 to 2637 bp, with 10 fragments under 100 bp and three exceeding 2000 bp. Among these homologous sequences, eight complete genes were identified, including two protein-coding genes (ndhK and petG) and six tRNA genes (trnD-GUC, trnH-GUG, trnM-CAU, trnN-GUU, trnP-UGG, and trnW-CCA) (Additional file 13). Although the origins of these transferred fragments remain unclear, these findings suggest the occurrence of gene transfer between the mitochondrial and plastid genomes of E. lindleyanum.Fig. 6. Homologous analysis based on different organelles. Mitochondrial plastid transferred fragments are represented by pink lines connecting the blue (mitochondrial genome) and green (plastid genome) arcs
Phylogenetic and collinear analysis
We reconstructed the phylogenetic relationships of 19 Campanulids species and two outgroup taxa using mitochondrial genome data. These Campanulids species are distributed across the orders Aquifoliales, Apiales, Dipsacales, and Asterales. A maximum likelihood (ML) tree was constructed based on the sequences of 36 conserved PCGs from the mitochondrial genome (Additional file 2). The results showed that species formed four distinct clades, each corresponding to one order within Campanulids. Most branches in the phylogenetic tree exhibited 100% bootstrap support (Fig. 7). The inter-order relationships were consistent with previous phylogenetic studies [50, 51].Fig. 7. Phylogenetic relationships of 19 Campanulids species. Bootstrap values are shown at each node. Colors represent the respective groups to which each species belongs
To further evaluate the utility of mitochondrial data in resolving phylogenetic relationships at the genus level, we reconstructed phylogenetic trees of the Asteraceae using mitochondrial and plastid genomes from 37 species. Both trees were generated using ML based on conserved PCGs: 28 mitochondrial and 79 plastid PCGs, respectively (Additional file 2). Both trees showed high bootstrap support, with most nodes achieving 100% support (Fig. 8). In both trees, E. lindleyanum clustered most closely with its congeneric species E. chinense, and together with Ageratum conyzoides formed a clade within Helianthus species. However, notable discrepancies were observed between the mitochondrial and plastid phylogenies. For instance, congeneric species of same genus were monophyletic in the plastid tree but not in the mitochondrial tree. In the mitochondrial phylogeny, individuals of the genus Chrysanthemum did not form a monophyletic group and included Artemisia argyi (Fig. 8). In both phylogenetic trees, E. lindleyanum, E. chinense, and Ageratum conyzoides formed a clade (highlighted with a pink background in Fig. 8). In the mitochondrial tree, this clade was sister to Bidens species, whereas in the plastid tree, it was sister to Helianthus species. Similar patterns were observed in the clades highlighted with yellow, gray, and orange backgrounds in Fig. 8. Additionally, the bootstrap values for these branches in the mitochondrial tree were noticeably lower than those in the plastid tree, indicating weaker support for the mitochondrial tree. Furthermore, within the Bidens clade, the positions of Bidens biternata and Bidens alba var. radiata were swapped between the two phylogenies. Similarly, the positions of Cichorium intybus and Taraxacum mongolicum also showed discrepancies between the two trees. Overall, the mitochondrial tree displayed lower bootstrap support compared to the plastid tree, suggesting that mitochondrial PCG data has limited power in resolving lower taxonomic-level phylogenies.Fig. 8. Comparison of phylogenetic trees based on mitochondrial and plastid genomes of 37 species. Bootstrap values are indicated above the branches
Based on the phylogenetic results, we performed mitochondrial genome synteny analyses for 10 species with differing phylogenetic placements (Figs. 8 and 9). The analyses revealed extensive homologous sequences among Asteraceae species, but significant variations in sequence arrangement (Fig. 10). E. lindleyanum shared substantial homologous sequences with its close relatives E. chinense and Ageratum conyzoides. The homologous sequences between E. lindleyanum and E. chinense were more abundant and longer compared to Ageratum conyzoides. The unmatched sequences between E. lindleyanum and E. chinense accounted for only 12.05%, while sequences with over 75% similarity comprised 74.90% (Fig. 9).Fig. 9. Dot-plot graphs illustrating syntenic sequences between the mitogenomes of Asteraceae species using Eupatorium lindleyanum as the referenceFig. 10Collinearity analysis among the mitochondrial genomes of 10 Asteraceae species. Red curved areas highlight regions of inversion, while gray areas represent regions with high homology
In contrast, Bidens bipinnata and Helianthus tuberosus showed the lowest similarity, with sequences over 75% similarity accounting for only 17.38% and 18.50%, respectively. Other species such as Saussurea inversa, Lactuca serriola, Arctium tomentosum, and Carthamus tinctorius exhibited higher similarity, with over 75% similarity accounting for 27.15%, 25.76%, 25.62%, and 24.67%, respectively. Despite being congeneric, E. lindleyanum and E. chinense displayed substantial genomic rearrangements and significant structural differences in their mitochondrial genomes (Fig. 10).
Discussion
E. lindleyanum is a significant medicinal plant with high therapeutic value, widespread applications, and abundant resources. However, the lack of molecular studies has limited the efficient utilization and development of this species. In this study, we present the first complete sequences of the mitochondrial and plastid genomes of E. lindleyanum. Comprehensive analyses were performed on the mitochondrial genome, including its sequence composition, structural organization, repeat elements, RNA editing sites, and intracellular gene transfer events, to uncover the evolutionary characteristics of this species at the molecular level. Furthermore, a phylogenetic tree based on mitochondrial genome data was constructed, highlighting the utility of mitochondrial genomes in resolving phylogenetic relationships within the Asteraceae family.
Characterization of the E. lindleyanum mitochondrial genome
Mitochondria are organelles in eukaryotic cells with an independent genetic system that provides energy for life processes [52]. Unlike the conserved structure of plastid genomes, mitochondrial genomes exhibit complex circular or linear structures due to the abundance of repetitive sequences [53]. With the maturation of third-generation sequencing technologies, more plant mitochondrial genomes have been successfully assembled, revealing various structural types such as single circular, multi-circular, and circular-linear combinations [51, 54, 55].
In this study, we used a hybrid assembly strategy combining short-read data from second-generation sequencing and long-read data from third-generation sequencing to assemble and analyze the complete mitochondrial genome of E. lindleyanum for the first time. The mitochondrial genome has a single circular structure typical of terrestrial plants, with a total length of 299,285 bp and a GC content of 45.06%, similar to its congeneric species E. chinense (286,160 bp; 45.09%, NC082294). Mitochondrial genome sizes exhibit remarkable variation among plants, ranging from as small as 66 Kbp in Vitis rotundifolia to as large as 11.7 Mbp in Larix sibirica [55, 56]. Compared to other species within the Asteraceae family, the size of E. lindleyanum's mitochondrial genome is larger than that of Bidens pilosa (183,061 bp), Chrysanthemum indicum (208,791 bp), Ageratum conyzoides (219,198 bp), and Artemisia argyi (229,534 bp), but smaller than Arctium lappa (312,598 bp), Saussurea inversa (335,372 bp), and Lactuca sativa (363,324 bp). Despite significant variation in the size and structure of plant mitochondrial genomes, their PCGs are generally conserved [57]. The mitochondrial genome of E. lindleyanum contains 32 unique PCGs. Compared with ancestral angiosperm mitochondrial genomes with 41 PCGs, E. lindleyanum lacks rpl2, rps1, rps2, rps7, rps10, rps11, rps14, rps19, and sdh3, suggesting gene loss or transfer events during evolution [58].
Mitochondrial genomes are characterized by frequent acquisition and integration of foreign DNA during evolution [59, 60]. Plastid-to-mitochondrial genome gene transfer is common in plants. In the mitochondrial genome of E. lindleyanum, we detected 15 MTPTs with a total length of 11,245 bp, accounting for 3.76% of the genome. The total length of these fragments is longer than that of Angelica biserrata (7,914 bp; 3.46%) and Viburnum chinshanense (9,902 bp; 1.54%) but shorter than that of Selenicereus monacanthus (46,496 bp; 2.03%), Dendrobium wilsonii (79,909 bp; 10.5%), and Panax notoginseng (20,632 bp; 3.11%) [51, 54, 61–63]. No evident correlation was found between the total length of MTPTs and the size of the mitochondrial genome.
Codon usage bias analysis provides insights into evolutionary processes and trends, influencing amino acid sequences, protein structure, and organismal adaptability [55, 64]. In mitochondrial genome of E. lindleyanum, the rpl16 gene starts with the GTG codon, and the cox1 gene starts with the ACG codon, while all other PCGs start with ATG. Stop codons preferentially end with A or T, consistent with the codon usage bias observed in many terrestrial plants [65, 66]. Similar to mitochondria, plastid PCGs exhibited a pronounced preference for A/T-ending codons.
Repetitive sequences, extensively present in mitochondrial genomes, play roles in adaptation, gene expression regulation, and genome structure evolution [67–69]. Additionally, repeat sequences can serve as molecular markers in population genetics and evolutionary studies, providing valuable genetic information [17]. In E. lindleyanum, we identified 73 SSRs, 31 tandem repeats, and 274 dispersed repeats. In published studies, the number of repeat sequences varies significantly among species. For instance, the mitochondrial genome of Artemisia argyi contains 65 SSRs, 14 tandem repeats, and 159 dispersed repeats, whereas Selenicereus monacanthus exhibits a much higher count, with 616 SSRs, 94 tandem repeats, and 4,459 dispersed repeats [54, 66]. The number of repeat sequences in mitochondrial genomes of other species, such as Viburnum chinshanense, Aquilegia amurensis, and Ilex metabaptista, is similar to that of E. lindleyanum [19, 55, 61]. Research suggests that the abundance of repeat sequences is closely associated with mitochondrial genome structure, genome size, and their role in gene expression regulation [70]. The mitochondrial genome of E. lindleyanum is smaller than that of Carthamus tinctorius, but larger than Artemisia argyi [66, 71]. Interestingly, the comparative number of SSRs among these species aligns with their genome sizes. Similarly, Selenicereus monacanthus possesses a mitochondrial genome approximately 7.6 times larger than that of E. lindleyanum, with a more complex structure and an SSR count roughly 8 times higher [54]. These findings corroborate previous hypotheses, suggesting that repeat sequences may play a crucial role in shaping genome size and structural complexity. Intriguingly, although repetitive sequences are less abundant in the plastid genome compared to mitochondria, three long tandem repeats were found to invade coding regions (rps18, ycf1, ycf2), whereas mitochondrial repeats remain strictly intergenic.
RNA editing is a common phenomenon in mitochondrial genes of higher plants, where it alters genetic information at the mRNA level [3]. As an essential post-transcriptional regulatory mechanism, RNA editing plays a crucial role in modulating traits associated with plant cytoplasmic inheritance [72]. Consequently, it plays a significant role in shaping traits related to mitochondrial activity and plant adaptation. In E. lindleyanum, 504 RNA editing sites were predicted, fewer than Cymbidium ensifolium (530 sites), Viburnum chinshanense (623 sites), Ilex metabaptista (543 sites), and Artemisia argyi (566 sites), but more than Punica granatum (466 sites) and Angelica biserrata (474 sites). Consistent with previous findings, all RNA editing events in E. lindleyanum occurred at the first or second nucleotide positions of codons, predominantly at the second position [18]. These edits exclusively involved C-to-U transitions, with most resulting in nonsynonymous codon changes. Six amino acid substitutions were most frequent: serine to leucine (111 occurrences), proline to leucine (111 occurrences), serine to phenylalanine (73 occurrences), proline to serine (40 occurrences), arginine to cysteine (37 occurrences), and arginine to tryptophan (34 occurrences). Among the genes, ccmB and mttB had the highest numbers of editing sites (37 each), a pattern also observed in studies on Artemisia argyi and Selenicereus monacanthus [54, 66]. Notably, two genes (cox1 and nad4L) were predicted to have start codons and two other genes (atp9 and ccmFC) were predicted to have stop codons generated through RNA editing. Previous studies suggest that such RNA editing events may result in more conserved and easily expressed proteins by refining start and stop codons [73].
Comparative analysis of mitochondrial genomes among closely related species
To better understand the mitochondrial structure of E. lindleyanum, we performed mitochondrial genome collinearity analysis on ten Asteraceae species based on phylogenetic results (Fig. 8). In the phylogenetic tree, E. chinense, the species most closely related to E. lindleyanum, showed the highest mitochondrial genome similarity with it (Figs. 9 and 10). However, Ageratum conyzoides, another species closely related to E. lindleyanum in this study, exhibited a lower mitochondrial genome similarity. Surprisingly, the similarity was even lower than that of more distantly related species such as Carthamus tinctorius and Lactuca serriola.
Multiple studies suggest that extensive rearrangements in mitochondrial genomes during evolution are the primary driver of their structural diversity [62, 74–76]. Moreover, these rearrangements may be closely linked to adaptive evolution in plants. Research indicates that mitochondrial genome rearrangements can aid plants in adapting to various ecological environments by influencing key functions such as energy metabolism and material transport [77]. As Eupatorium species are widely distributed, their mitochondrial genomes may have undergone multiple rearrangements under environmental selection pressures, enhancing their adaptability to diverse habitats. The significant structural differences observed in the mitochondrial genome of E. lindleyanum in this study could be a result of such adaptive evolution. Future research should conduct multi-species comparative genomics to explore the mechanisms of genome rearrangement and its impact on phylogeny and species diversification.
We further compared the mitochondrial genomes of E. lindleyanum and its congeneric species, E. chinense, in detail. Both species have single circular mitochondrial genomes with similar lengths (differing by only 13,125 bp). However, significant structural differences were observed between them. The observed structural differences highlight the high diversity of mitochondrial genome structures within the Eupatorium genus. Despite these structural variations, the protein-coding gene content of the two species was nearly identical, with the exception of the rps14 gene, which is absent in the mitochondrial genome of E. lindleyanum. This suggests that rps14 might have undergone functional loss or intergenomic transfer during the evolution of E. lindleyanum. Functional compensation mechanisms, such as the expression of nuclear-encoded proteins to replace the missing gene, may ensure normal mitochondrial function [78]. Future studies should employ transcriptomic and proteomic analyses to investigate the compensatory mechanisms for the rps14 gene in E. lindleyanum.
The potential of mitochondrial genomes in phylogenetic reconstruction
In previous studies, phylogenetic analyses have primarily relied on nuclear and plastid genome sequences [79, 80]. The limitations of mitochondrial sequences in phylogenetic studies stem from their relatively low mutation rates. Studies have shown that nuclear genomes evolve the fastest, plastomes at half that rate, and plant mitogenomes at less than one-sixth the rate of nuclear genomes [81, 82]. However, compared to plastid and nuclear genomes, mitochondrial genomes have unique evolutionary histories, offering novel insights into phylogenetic analyses. Inconsistencies between phylogenies based on mitochondrial, plastid, or nuclear genomes can reflect complex evolutionary processes such as organelle capture and hybridization [83, 84]. With advancements in high-throughput sequencing, mitochondrial genome data are now more accessible, and their application in studying plant phylogenetic relationships and evolutionary patterns is becoming increasingly widespread.
To explore the potential of mitochondrial genome data in reconstructing phylogenetic relationships, we constructed phylogenetic trees for Campanulids and Asteraceae species (Figs. 7 and 8). The mitochondrial genome-based phylogeny successfully resolved the relationships among 19 species of Campanulids and two outgroups, grouping them accurately into four clades corresponding to Aquifoliales, Apiales, Dipsacales, and Asterales. High bootstrap support values observed in most branches highlight the reliability of mitochondrial genome data in resolving higher-level taxonomic relationships. Furthermore, the consistency of our results with previous studies supports the robustness of mitochondrial datasets in reflecting order-level evolutionary history.
However, our analysis of 37 Asteraceae species using mitochondrial and plastid genomes revealed notable differences in resolving lower-level phylogenetic relationships. While both phylogenetic trees exhibited high bootstrap support, the plastid genome provided clearer resolution at the genus level, with congeneric species forming well-supported monophyletic groups. In contrast, the mitochondrial phylogeny displayed significant inconsistencies. Additionally, differences in branching positions between the two trees suggest that mitochondrial protein-coding gene data may perform less effectively in resolving phylogenetic relationships among closely related species.
These discrepancies likely result from the unique evolutionary characteristics of mitochondrial genomes, including low mutation rates, frequent structural rearrangements, and potential gene introgression or hybridization events, which may obscure phylogenetic signals. Moreover, the relatively limited number of protein-coding genes available in mitochondrial genomes compared to plastid genomes may further constrain their resolving power. Despite these limitations, our study demonstrates the utility of mitochondrial genome data in providing complementary insights into plant phylogenetics. Future research should incorporate broader taxonomic sampling to comprehensively evaluate the strengths and limitations of mitochondrial data in resolving phylogenetic relationships across different taxonomic levels.
Conclusion
This study represents the first complete sequencing and assembly of the mitochondrial genome of E. lindleyanum, revealing its typical single circular structure and genomic differences compared to related species within the same family and genus. Analyses of genome size, repeat sequences, MTPTs fragments, RNA editing sites, and codon usage bias highlight both the diversity and conservation of the mitochondrial genome of E. lindleyanum during evolution. Notably, the findings on repeat sequences and genomic rearrangements underscore their potential key roles in plant adaptive evolution. Comparative analysis with E. chinense, revealed the high structural diversity of mitochondrial genomes within Eupatorium and identified potential avenues for studying gene loss and compensatory mechanisms. Phylogenetic analysis demonstrated the capability of mitochondrial genome data to provide critical insights into plant evolutionary patterns and lineage relationships. The comprehensive data provided by this study serve as a foundational reference for mitochondrial genome research within Eupatorium, potentially guiding further studies on other species in the genus. Moreover, this work establishes a basis for exploring evolutionary mechanisms and environmental adaptability in Eupatorium species and offers new perspectives on the application of mitochondrial genomes in plant phylogenetics and genomics.
Supplementary Information
Additional file 1 Additional file 2 Additional file 3 Additional file 4 Additional file 5 Additional file 6 Additional file 7 Additional file 8 Additional file 9 Additional file 10 Additional file 11 Additional file 12 Additional file 13
