Structural Remodeling and Enzymatic Replacement Shape the Evolution of Organellar Group II Introns in Ulva
Feng Liu, Shuangle Jin, Huiyin Song

TL;DR
This study explores how group II introns evolve in the compact organellar genomes of Ulva algae, revealing structural and enzymatic changes that shape their diversity.
Contribution
The paper identifies lineage-specific structural remodeling and independent enzymatic replacements in organellar group II introns of Ulva.
Findings
Most intron families retain the canonical six-domain structure, but some mitochondrial IIA families have a seven-domain configuration due to lineage-specific insertions.
Phylogenetic analysis suggests multiple independent invasion events for LHE-encoding introns within IIB lineages.
The chloroplast infA-62 family represents a vertically transmitted lineage with stable inheritance and scaffold preservation.
Abstract
Group II introns are catalytic RNAs that combine self-splicing ribozyme activity with mobility and have played major roles in shaping organellar genome evolution. In green macroalgae of the genus Ulva, organellar genomes are highly compact, yet they harbor unusually diverse and dynamic repertoires of group II introns. To understand how organellar group II introns diversify and persist within compact organellar genomes, we performed a comparative analysis of mitochondrial and chloroplast group II introns across Ulva, integrating secondary structure reconstruction, intron occurrence patterns, and phylogenetic inference based on both conserved intron RNA regions and intron-encoded proteins (IEPs), including reverse transcriptase/maturase (RT/M) and LAGLIDADG homing endonuclease (LHE). A total of 168 mitochondrial and 123 chloroplast introns were identified and classified into 32 families…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8- —National Natural Science Foundation of China
- —Shandong Provincial Key Research and Development Program Project
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · RNA Research and Splicing · Photosynthetic Processes and Mechanisms
1. Introduction
Group II introns are a distinctive class of catalytic RNAs that have attracted long-standing interest as model systems for investigating RNA catalysis and intron mobility within genomes [1,2]. Their RNA scaffolds adopt a conserved six-domain architecture (DI–DVI), within which structural and catalytic functions are partitioned rather than evenly distributed [3,4]. DI serves as the main scaffold for intron folding, DV houses the catalytic core, and DVI contains the branch-point adenosine required for lariat formation during excision [5,6]. This unusual combination of ribozyme activity and mobility has positioned group II introns as a central model system for dissecting RNA structure-function relationships [7,8] and for understanding the evolutionary transition from self-splicing introns to the spliceosomal machinery [6,9].
In algae, fungi, and plants, group II introns have played important roles in shaping organellar genome evolution [10]. They are typically protein-assisted rather than fully autonomous self-splicing elements [11,12], and their persistence is increasingly interpreted as being mediated by host splicing-factor networks [13,14]. In addition to their catalytic RNA components, many group II introns encode intron-encoded proteins (IEPs), most commonly reverse transcriptase/maturase (RT/M) proteins and, less frequently in mitochondrial genomes of some fungi and green algae [10,15], LAGLIDADG homing endonucleases (LHEs) [16]. RT/M proteins promote both splicing and mobility through their maturase activity and reverse transcription [17], whereas LHEs facilitate intron propagation by introducing site-specific DNA double-strand breaks [18,19]. Accumulating evidence from bacterial and organellar systems indicates that group II introns and their associated IEPs evolve as coupled RNA-protein units [20,21].
Mitochondria and chloroplasts usually exhibit compact, streamlined genomes, yet they often retain complex intron repertoires that can be lineage-specific and highly dynamic [22,23]. The evolutionary persistence of organellar group II introns is generally considered to reflect a balance between intron mobility, functional constraints on splicing, and progressive decay [11,12]. Active introns can spread via retrohoming mediated by conserved exon-binding site (EBS)/intron-binding site (IBS) interactions and functional IEPs [1,24]. However, degeneration of structural domains or IEPs can reduce mobility, favor persistence at specific loci, or ultimately lead to intron loss [19,20].
Green macroalgae of the genus Ulva (Ulvophyceae, Chlorophyta) are ecologically important coastal primary producers and are well known for forming large-scale green tides [25,26]. Recent studies have shown that both mitochondrial and chloroplast genomes in Ulva are compact but unusually dynamic, exhibiting extensive variation in genome architecture, gene order, and intron content. Ulva organellar introns are characterized by extensive family diversity, pronounced structural variability, and frequent gain and loss, even among closely related taxa (e.g., Ulva prolifera and Ulva linza) [22,23,26]. Previous work has documented the abundance of group II introns in Ulva organellar genomes [27,28], but these introns have largely been examined in isolation, rather than as components of an integrated evolutionary system.
Despite the growing number of available Ulva organellar genomes (Supplementary Tables S1 and S2), it remains unclear how group II introns diversify and persist over evolutionary time within these compact genomes. In particular, it is unknown whether organellar group II intron evolution is dominated by unidirectional degeneration or instead involves recurrent structural innovation, enzymatic replacement, and differential distribution patterns across genomic compartments. Ulva therefore provides a useful model system to examine how mobile ribozymes persist under extreme genome compaction. Here, we performed a comparative analysis of organellar group II introns across Ulva mitochondrial and chloroplast genomes, integrating intron family classification, secondary structure reconstruction, occurrence-frequency analysis, and phylogenetic inference based on both conserved intron RNA scaffolds and IEPs.
2. Results
2.1. Diversity and Structural Classification of Ulva Organellar Group II Introns
A total of 168 mitochondrial and 123 chloroplast group II introns were identified (Supplementary Tables S1 and S2) and classified into 32 intron families according to shared insertion sites and sequence homology among Ulva organellar genome sequences (Supplementary Table S3). A total of 20 mitochondrial intron families were defined, including 14 families encoding RT/M proteins, five families encoding LHEs, and one family (cox1-686), which was classified as an IEP-lacking IIB-LHE intron [26] based on the presence of a 5′ terminal linker sequence, which is characteristic of some IIB1-LHE introns [29]. A total of 12 chloroplast intron families were identified, including 10 families encoding RT/M proteins, one family (orf185-47) assigned as an IEP-lacking IIB2-RT/M intron and one IIB-like family (infA-62), which has completely lost its IEP [28].
Secondary structure reconstruction based on thermodynamic folding predictions and manual refinement showed that the vast majority of Ulva organellar group II introns conform to the canonical six-domain (DI–DVI) secondary structure model (Figure 1). However, four mitochondrial IIA intron families (atp1-990, cox1-643, cox1-760, and nad5-1057) exhibited an additional stem-loop structure inserted between DIII and DIV, resulting in a non-canonical seven-domain architecture (Figure 1A). This additional element, here designated DIIIa, represents a distinct structural elaboration within a subset of mitochondrial IIA families.
Intron classification was performed by integrating diagnostic secondary-structure features with the identities of their associated IEPs. Group IIA introns were subdivided into IIA1 and IIA2 subgroups based on characteristic junction motifs between DII and DIII (IIA1: BVGA; IIA2: RRGA) and additional conserved structural features [30]. Group IIB introns were assigned to IIB1 and IIB2 subgroups according to diagnostic motifs at the DIV-DV junction (IIB1: YW; IIB2: RG), the absence or presence of the specific domain Ia in the 5′ strand of domain I(i)/I(ii) (with noted lineage-specific exceptions), and other conserved structural sites [30]. With the exception of the previously described infA-62 family, which belongs to the IIB-like subgroup, all identified intron families could be assigned to six principal subgroups, namely IIA1-RT/M, IIA2-RT/M, IIB1-RT/M, IIB1-LHE, IIB2-RT/M, and IIB2-LHE (Supplementary Table S3). Across both mitochondrial and chloroplast genomes, IIB intron families were more abundant than IIA intron families.
2.2. Variations in Length and GC Content Among Organellar Group II Introns
Marked differences in intron length were observed among intron lineages and between organellar compartments (two-way ANOVA, lineage effect p < 0.01; organelle effect p < 0.01; lineage × organelle interaction p < 0.01) (Figure 2A). Mitochondrial RT/M-encoding introns were the largest, with an average length of 2449 nt, whereas mitochondrial LHE-encoding introns were substantially shorter, with an average length of 1364 nt (Supplementary Table S4). Chloroplast RT/M-encoding introns occupied an intermediate size range with an average length of 2281 nt (Figure 2A). The chloroplast infA-62 family (IIB-like) was the shortest, with an average length of 615 nt (Supplementary Table S4), which is consistent with its highly degenerated structure [28]. In both mitochondrial and chloroplast genomes, the average length of IIA-RT/M introns was consistently longer than that of IIB-RT/M introns (p < 0.01, post hoc comparisons following two-way ANOVA) (Figure 2A). This difference is likely attributable to the presence of an intact RT/M open reading frame (ORF) in all IIA-RT/M introns, whereas the RT/M ORF in several IIB-RT/M introns showed varying degrees of degeneration.
Length variation among intron families was not only associated with concerted changes across structural domains (DI–DIV and DVI), but also involved the insertion of additional accessory domains. Several intron families displayed extreme size variation within specific domains. The rnl-2698 family possesses the most compact DI (257–287 nt) due to the complete absence of the domains IC2 and ID2 (Supplementary Table S5). The cox1-643 family exhibited the shortest DII (22 nt) among the II-RT/M families, and the rnl-2080 family contained the smallest DII (14–19 nt) among the II-LHE families. Four IIB2-RT/M families (atpB-627, atpI-256, orf185-47, and petB-69) and one IIA2-RT/M family (petD-87) contained a highly degenerated DIII (12–18 nt) (Supplementary Table S5). The infA-62 family harbored the shortest DIV (24–32 nt), while the 3081-nt atp1-1316, which was detected exclusively in a single Ulva species (MN853878), contained the longest DIV (2445 nt) among all families. Within most intron families, intrafamilial length variation was primarily driven by dramatic changes in DIV, particularly those associated with IEP degeneration. Some intron families showed lineage-specific divergence involving multiple domains. For instance, the nad3-216 family exhibited pronounced length variation (1634–2560 nt), largely attributable to the highly plastic DI (371–430 nt) and DIV (931–1864 nt) (Supplementary Table S5).
GC content analyses revealed clear lineage- and organelle-dependent patterns among group II introns (two-way ANOVA, lineage effect p < 0.01; organelle effect p < 0.01). Mitochondrial RT/M-encoding introns exhibited the highest GC content (avg. 45.56%), whereas chloroplast RT/M-encoding introns were consistently less GC-rich (avg. 36.09%) (Figure 2A). Mitochondrial LHE-encoding introns showed intermediate GC levels (avg. 35.55%) and the chloroplast infA-62 displayed the lowest GC content (avg. 22.73%). Most intron families displayed GC contents that differed substantially from those of their host organellar genomes (Figure 2B and Supplementary Table S4), suggesting partial decoupling between intron and host genome compositional evolution. The GC content of all RT/M-encoding intron families and nearly all LHE-encoding families (except cox1-874) was significantly higher than that of the host genome (p < 0.01). In contrast, the GC content of the chloroplast infA-62 family was lower than that of the chloroplast genomes (p < 0.01) (Figure 2 and Supplementary Table S4). These differences may reflect strong selection pressure toward AT enrichment in chloroplast introns, especially the infA-62 family. However, mitochondrial introns showed significantly different evolutionary patterns, and their compositional difference reflected a comprehensive result involving selective pressure, neutral decay, and modular enzymatic substitution.
2.3. Structural Remodeling of Group II Intron Domains
Comparative analyses of secondary structures revealed extensive structural plasticity across Ulva organellar group II introns, with DI representing the primary hotspot for structural remodeling. Multiple DI substructures exhibited lineage-specific loss, reduction, or modification. In contrast, the catalytic core domain (DV) remained highly conserved across all intron families, typically measuring ~34 nt in length, with only minor deviations observed in a few lineages (Supplementary Table S5).
Patterns of conservation differed markedly between intron types. All RT/M-encoding intron families retained a complete set of EBS tertiary interaction motifs, indicating strong conservation of the retrohoming recognition interface. In contrast, LHE-encoding families displayed heterogeneous patterns. Some families (e.g., cox1-874, rns-670, cox1-686) retained the full set of EBS motifs (EBS1–EBS3), whereas others (e.g., rnl-2080, rns-420 and 90.0% of rnl-2698) showed partial erosion of recognition motifs, most notably the loss of EBS2 in certain lineages (Figure 3). Similarly, domain Ia, typically regarded as a diagnostic feature of IIB2 introns, was retained in most IIB2 families but absent in several lineages, including the chloroplast petB-23 family and many members (76.0%) of the mitochondrial nad3-216 family. In addition, small Ia-like elements were detected sporadically in a few IIB1 intron families (petB-169 and petB-277) (Figure 3). These observations indicate that canonical diagnostic features of group II introns can undergo lineage-specific modification or erosion.
Loss and modification of DI substructures were widespread among intron families. Complete loss of domain IC2 occurred in several families of IIA1, IIA2, IIB1, and IIB2 lineages, while domain ID2 was absent in mitochondrial LHE-encoding families and in the chloroplast petB-23 family (Figure 3). In addition to domain loss, several lineages displayed structural insertions within DI. An additional stem-loop element (ID2a) was inserted between domains ID2 and ID3 in a subset of IIB1 and IIB2 families, whereas four mitochondrial IIA families (atp1-990, atp1-1095, atp1-1316, and nad5-1057) consistently contained an extra stem-loop structure (domain IE) inserted between domain ID and the terminal I(ii). These patterns highlight the modular nature of DI, which appears capable of accommodating both structural loss and lineage-specific innovation.
Terminal boundary motifs of introns were generally conserved but showed lineage-dependent variation. The canonical terminal motif (GUGCGA/AY) was largely maintained across most intron families, although mitochondrial IIA introns exhibited several substitutions restricted to the 5′ terminal motif, whereas the chloroplast IIA family petD-87 remained highly conserved at both termini (Supplementary Table S6). Most IIB-RT/M introns retained a conserved 5′ terminal motif, while several IIB-LHE and chloroplast intron families displayed modifications at one or both boundaries. A distinctive feature was observed in several mitochondrial IIB-LHE intron families (e.g., cox1-686, rnl-2698, and 35.3% of rns-420), in which a short 5′ terminal linker sequence was inserted between the upstream exon and the conserved 5′ terminal motif. The presence of this linker was consistently associated with loss of the canonical branch-point adenosine in DVI among mitochondrial IIB1-LHE intron families (Figure 3 and Supplementary Table S6). This coordinated modification suggests a structural signature linked to degeneration and reduced mobility in certain LHE-encoding lineages.
These lineage-specific modifications reveal an asymmetric pattern of constraint across the group II intron scaffold. The catalytic core domain (DV) is the most conserved element among Ulva organellar introns, whereas DI represents the principal hotspot of remodeling through recurrent loss, reduction, or insertion of peripheral substructures (e.g., IC2/ID2 loss and ID2a/IE/DIIIa insertions) (Figure 3). Structural variability differs among intron subclasses. RT/M-encoding introns consistently retain a complete EBS-IBS recognition interface, whereas LHE-encoding introns exhibit heterogeneous erosion of key motifs, most prominently the recurrent loss of EBS2. In mitochondrial IIB1-LHE lineages, this erosion is accompanied by the characteristic insertion of a 5′ terminal linker and the concurrent loss of the canonical branch-point adenosine in DVI. Additional lineage-specific signatures include pronounced DIII truncation in several IIB2-RT/M families and the presence of a derived DIIIa insertion shared by a subset of mitochondrial IIA1 introns. These patterns support a model in which a strongly conserved catalytic core is embedded within a permissive RNA scaffold whose peripheral domains can undergo both degeneration and innovation in a lineage- and IEP-dependent manner.
2.4. Phylogenetic Analysis Based on Conserved Intron RNA Sequences and Intron Occurrence Frequency
A phylogenetic tree was reconstructed using the maximum likelihood method based on conserved RNA sequences shared by Ulva organellar group II introns. The resulting topology clearly resolved these organellar intron families into three major evolutionary clades corresponding to IIA, IIB and IIB-like introns (Figure 3 and Supplementary Figure S1). The infA-62 family, representing an IIB-like lineage, was positioned as an outgroup, reflecting its highly divergent evolutionary status.
Within the IIA clade, four mitochondrial intron families characterized by a seven-domain (DIIIa-containing) architecture (atp1-990, cox1-643, cox1-760, and nad5-1057) tightly clustered together to form a well-supported subclade (Figure 3). Given that these four families contained the unique DIIIa, this topology supports a lineage-specific insertion event of DIIIa that occurred within the subclade. This subclade clustered with another subclade formed by four mitochondrial intron families (atp1-1095, cob-877, cox1-199, and cox2-424), tightly constituting the IIA1 lineage. The IIA1 lineage was resolved as a sister lineage to the IIA2 lineage comprising atp1-1316 and petD-87 (Figure 3).
Within the IIB clade, two mitochondrial II-LHE intron families (rnl-2698 and rns-420) and four mitochondrial II-RT/M families (rnl-1963, rns-780, petB-277, and petB-169) clustered together, forming the IIB1 lineage (Figure 3). The IIB2 lineage was composed of three chloroplast-derived and one mitochondria-derived subclades, which did not form an independent monophyletic clade. The phylogenetic topology supports the derivation of IIB2 introns from within the IIB1 lineage. The mitochondrial II-LHE intron families were divided into two subgroups: two (rnl-2698 and rns-420) in the IIB1 lineage and four (cox1-686, cox1-874, rnl-2080, and rns-670) in the IIB2 lineage. Four chloroplast IIB2-RT/M families (atpB-627, atpI-256, orf185-47, and petB-69), which shared an extremely degenerated DIII, clustered together, forming a well-supported subclade, indicating the small DIII was a lineage-specific characteristic of this subclade.
Analysis of intron occurrence frequency across Ulva species revealed substantial variation among intron families both within and among species, in sharp contrast to the stable presence of infA-62 across all sampled chloroplast genomes. Except for infA-62, the intron families with the highest occurrence frequencies (>40%) were exclusively group IIB2-RT/M introns, including nad3-216 (52.3%), petB-69 (48.9%), cox1-874 (47.7%), and rns-670 (45.5%) (Figure 3). By contrast, the RT/M-encoding intron families with low occurrence frequencies (<15%) were frequently accompanied by pronounced structural decay, such as the complete loss of domain IC2 in atp1-1095, atp1-1316, rns-780, atpB-537, and psbC-496, or the loss of domain ID2 in petB-23. All IIA1 introns harboring the specialized DIIIa exhibited relatively low occurrence frequencies (4.5–15.9%) (Figure 3). Organellar group II introns in Ulva, irrespective of whether they encode RT/M or LHE, retain dynamic distribution both within and among species, reflecting an evolutionary pattern of recurrent intron gain and loss.
The two chloroplast IIB1-RT/M families, petB-169 and petB-277, exhibited highly similar primary sequences and secondary structures but differed mainly at the EBS-IBS recognition interfaces (Figure 4A). Divergence was concentrated in the EBS regions, particularly EBS1 and EBS2 (Figure 5A), consistent with the recognition of distinct IBS targets (IBS1 and IBS2) and the occupancy of different genomic loci. Overall scaffold conservation coupled with interface-specific divergence suggests insertion-site switching between these closely related families.
The two mitochondrial IIB2 families, cox1-686 and cox1-874, show high similarity in structure (Figure 4B) and a close relationship in the phylogenetic tree (Figure 3). The cox1-686 has a 5′ terminal linker sequence, which is a typical characteristic of group IIB-LHE introns, but completely lost its IEP. Combined with their occurrence frequency and novel EBS variants, these data suggest that cox1-686 may have been derived from the cox1-874 lineage and inserted into another genomic locus, followed by complete IEP loss (Figure 5B).
2.5. Phylogenetic Analysis Based on Conserved IEP Sequences
All group IIA introns harbored an intact intron-encoded RT/M ORF, whereas some members of most group IIB intron families had a degenerated RT/M or LHE ORF, or even complete loss. No RT/M ORFs or recognizable RT/M remnants were detected in any IIB-LHE introns. The IIB introns in mitochondria tend to undergo degeneration more frequently than those in chloroplasts. Apart from rnl-2698, IEP degeneration was evident across all mitochondrial IIB families, whereas in chloroplast IIB families such progressive loss of IEP was restricted to orf185-47 and only some members of petB-69 (13.0%) (Figure 3). Phylogenetic analysis based on amino acid sequences of RT/M proteins revealed that RT/M proteins encoded by organellar group II introns in Ulva clustered into two clearly separated evolutionary clades corresponding to IIA and IIB introns (Figure 6A and Supplementary Figure S2). The overall topology of the IIA and IIB clades was largely consistent with the RNA-based phylogeny. These results supported a coevolutionary pattern between II-RT/M introns and their associated RT/M proteins within Ulva organellar genomes.
Phylogenetic analysis based on conserved amino acid sequences of LHEs revealed that LHEs encoded by IIB-LHE introns formed two well-supported clades, corresponding to IIB1 and IIB2, respectively (Figure 6B and Supplementary Figure S3). Phylogenetic analyses indicate that the IIB1 and IIB2 clades are sister clades to each other. This protein-based phylogenetic topology showed strong concordance with the phylogeny inferred from conserved intron RNA sequences, indicating that, following the formation of IIB-LHE introns, the intron RNA scaffolds and the associated LHE proteins displayed a coevolutionary pattern. Phylogenetic reconstruction indicated that the II-LHE families did not form a single independent evolutionary lineage. Instead, they were distributed in the IIB1 and IIB2 clades (Figure 6B), supporting a dual-origin scenario for mitochondrial IIB-LHE introns. One subgroup (rnl-2698 and rns-420) originated from an ancestral IIB1 lineage and shared a common ancestor with two IIB1-RT/M families (rnl-1963 and rns-780). The second subgroup (cox1-686, cox1-874, rnl-2080, and rns-670) originated from an ancestral IIB2 lineage and was homologous to the common ancestor of three IIB2-RT/M families (cox2-751, nad3-215, and nad3-216). These results support a model in which at least two distinct LHE genes independently invaded the RNA scaffolds of IIB1 and IIB2 introns during evolution.
Overall, the congruent RNA- and protein-based phylogenies, together with the dual origin of IIB-LHE introns, support coevolution of intron RNA scaffolds and IEPs and provide a phylogenetic framework for discussing repeated IEP invasion, degeneration, and turnover in Ulva organellar genomes.
3. Discussion
3.1. Modular Remodeling of DI and Variation at the Recognition Interface Between EBS and IBS
To integrate the distributional, structural, and motif-level patterns revealed by our comparative analyses, we interpret Ulva organellar group II intron families along a continuum of evolutionary states ranging from interface-driven retargeting and episodic spread to progressive degeneration and, in some cases, long-term host accommodation (Figure 7). This schematic framework is intended as a heuristic summary rather than a single deterministic pathway, and the canonical RT/M-encoding configuration is used as a practical reference for describing modular remodeling and IEP turnover (Figure 7A).
Within this framework, DI ranks among the most structurally variable regions across Ulva organellar introns. Recurrent modification or loss of DI substructures (e.g., IC2 and ID2) is observed across multiple lineages, highlighting DI as a major hotspot of scaffold remodeling. This variability does not imply instability of the ribozyme as a whole; instead, it is consistent with a conserved catalytic core supported by a permissive RNA scaffold that tolerates divergence in peripheral regions. Because DI mediates long-range tertiary contacts and provides interaction surfaces for both IEPs and host splicing factors [31,32], structural flexibility at this domain may facilitate persistence of introns across distinct organellar contexts without disrupting core splicing chemistry.
Variation at the EBS-IBS recognition interface provides an additional axis for intron diversification by enabling locus turnover without extensive changes to the overall scaffold. The two chloroplast IIB1-RT/M families, petB-169 and petB-277, remain highly similar in overall RNA scaffold organization yet differ substantially in their EBS motifs, particularly EBS1 and EBS2 within domain ID (Figure 7B). Because base-pairing between EBS and IBS motifs determines target-site recognition during retrohoming, this pattern is consistent with an “interface-first” mode of diversification in which the catalytic scaffold remains largely conserved while the recognition interface evolves rapidly to access new IBS targets [3]. A direct prediction of this model is that newly colonized loci should exhibit IBS signatures complementary to derived EBS variants, which can be evaluated through insertion-site sequence logos [1]. These observations support a jumping-degeneration continuum in which interface innovation promotes insertion-site switching and spread, whereas progressive structural decay constrains mobility and predisposes lineages to contraction or long-term fixation in organellar genomes [19,33].
Motif conservation patterns across intron subclasses further support this interpretation. The uniform retention of EBS1 and EBS2 in RT/M-encoding introns is consistent with strong constraints on the canonical EBS-IBS interface required for retrohoming and efficient splice-site definition. In contrast, the recurrent erosion of EBS2 in LHE-encoding lineages suggests relaxed constraints on retrohoming-type recognition, potentially reflecting a shift toward endonuclease-assisted mobility and/or reduced mobility. Moreover, the consistent co-occurrence of a 5′ terminal linker with loss of the canonical branch-point adenosine in mitochondrial IIB1-LHE introns indicates coordinated remodeling of exon-intron boundaries and lariat-formation signals, which may alter splicing pathways or increase dependence on host factors (Figure 3 and Figure 7).
These DI- and interface-centered signatures provide a mechanistic basis for the evolutionary continuum (Figure 7) and motivate the clade-specific structural innovation described in Section 3.2 as well as the modular replacement and degeneration of IEPs discussed in Section 3.3.
3.2. The Seven-Domain Secondary Structure Configuration Generated by a Lineage-Specific DIIIa Insertion
Despite an evolutionary trend toward compaction in Ulva organellar genomes, most group II introns examined here retain the canonical six-domain (DI–DVI) secondary-structure organization, indicating strong conservation of the overall ribozyme architecture among these organellar introns [4,34]. Nevertheless, lineage-specific structural innovations were also detected. Notably, four mitochondrial IIA1-RT/M families (atp1-990, cox1-643, cox1-760, and nad5-1057) harbor an additional accessory stem-loop element, designated DIIIa, inserted between domains DIII and DIV (Figure 7C).
Atypical bacterial group II introns with an extra 3′ extension downstream of DVI have been reported, and two stem-loop substructures have been described as part of this extension in the Bacillus cereus group [5]. However, based on our literature survey, we found no previous reports describing a DIIIa-like insertion positioned between DIII and DIV in organellar group II introns. We therefore interpret DIIIa as a lineage-specific mode of structural remodeling in organellar introns.
Phylogenetic analyses based on conserved intron RNA sequences further showed that these four DIIIa-containing intron families form a well-supported subclade within the IIA1 lineage, most parsimoniously consistent with a single ancestral DIIIa insertion followed by vertical inheritance rather than multiple independent gains. This pattern indicates that, alongside degeneration and domain reduction, organellar introns can also diversify via discrete acquisition of peripheral structural elements.
The functional consequences of the DIIIa insertion remain uncertain. Domain III participates in tertiary interactions and can influence folding pathways [31,32]. Insertion of an accessory stem-loop at the DIII-DIV junction could plausibly affect local folding dynamics or scaffold stability [33]. Importantly, this modification occurs outside the catalytic core (DV), which remains highly conserved in both length and sequence across the introns analyzed [4,34]. We therefore view the DIIIa-containing introns as a derived configuration in which catalytic capacity is maintained while peripheral remodeling proceeds.
Notably, these DIIIa-containing IIA1 families occur at relatively low frequencies (4.5–15.9%) in our dataset. Interpreting occurrence frequency as a long-term proxy that integrates invasion, retention, and loss (see Section 3.4), this pattern is consistent with limited spread and/or reduced persistence relative to broadly distributed families. The DIIIa insertion illustrates that organellar group II introns can undergo lineage-specific structural innovation beyond simple degeneration [35]. Future functional assays will be required to test whether DIIIa alters splicing efficiency, mobility, or dependence on host factors.
3.3. Evidence for Two Independent Origins of Mitochondrial IIB-LHE Introns
A central evolutionary inference from our phylogenetic analyses is that mitochondrial IIB-LHE introns in Ulva originated through at least two independent acquisition events rather than from a single monophyletic II-LHE lineage. This conclusion is supported by the combined topology of RT/M- and LHE-based phylogenies, together with their concordance with RNA scaffold relationships (Figure 6).
RT/M phylogenies show that RT/M proteins encoded by organellar group II introns cluster into two clearly separated clades corresponding to IIA and IIB introns. The topology of the IIA clade is fully congruent with that inferred from conserved intron RNA sequences, whereas the IIB clade shows broad, though not complete, agreement with the RNA-based phylogeny. Notably, the RNA-based analysis excludes all IEP-encoding regions, so the overall concordance between RNA- and protein-based trees supports a long-term coevolutionary association between intron RNA scaffolds and RT/M proteins [3,30]. Across Ulva organellar genomes, degeneration of IEPs is more common in IIB introns than in IIA introns, and this tendency is especially pronounced in mitochondrial IIB introns relative to their chloroplast counterparts, consistent with different selective regimes across the two organellar contexts.
LHE phylogenies resolve two well-supported clades corresponding to IIB1-LHE and IIB2-LHE introns. These protein-based groupings closely mirror the classification of their associated RNA scaffolds, suggesting that, following LHE acquisition, LHEs and intron RNAs evolved in a coupled manner [36,37]. Importantly, the II-LHE introns do not form a single monophyletic lineage; instead, they are distributed across both the IIB1 and IIB2 clades. The most parsimonious explanation is therefore that at least two distinct LHE genes independently invaded pre-existing IIB1 and IIB2 RNA scaffolds (Figure 7D). In addition, the presence of LHE-encoding IIB introns in three mitochondrial genes (cox1, rnl, and rns) suggests subsequent proliferation after these independent recruitment events, extending beyond the more restricted distributions reported for LHE-encoding IIB introns in other systems [3,38].
In the Ulva organellar genomes examined here, no complete RT/M ORF or recognizable RT/M remnants were detected in any IIB-LHE intron. This pattern is consistent with LHE acquisition occurring in lineages where the canonical RT/M-driven retrohoming machinery had already been inactivated or lost, although highly degraded remnants cannot be entirely excluded [19,37]. Consistent with this interpretation, RT/M-encoding intron families retain a complete set of EBS motifs, whereas EBS2 loss is observed only in some LHE-encoding families (Figure 7E). Mechanistically, replacement of RT/M with an LHE implies a shift in mobility strategy: retrohoming depends on functional RT/M proteins and intact EBS-IBS interactions [33,39], whereas LHEs can promote intron spread via site-specific DNA cleavage followed by repair-mediated integration [19,40]. Under this model, acquisition of an LHE may provide an alternative route for persistence or limited expansion once RT/M function becomes compromised. More broadly, the independent recruitment of LHEs into two distinct IIB scaffolds highlights group II introns as evolvable RNA platforms capable of accommodating different enzymatic modules [37]. The dual-origin pattern of IIB-LHE introns and the associated motif/ORF changes support modular enzyme replacement as a recurrent evolutionary mechanism in Ulva mitochondria.
3.4. Occurrence Frequency of Intron Families as a Proxy for Evolutionary State
Occurrence frequency of intron families varies greatly among Ulva organellar genomes, reflecting a dynamic history of intron gain and loss across both mitochondrial and chloroplast lineages [1,28]. In the framework proposed here (Figure 7), we interpret occurrence frequency not as a direct measure of current intron mobility but as a long-term proxy integrating invasion success, vertical persistence, and lineage-specific loss across the genus Ulva. Although indirect, this metric provides an empirical basis for comparing evolutionary trajectories among different intron families.
With the exception of the chloroplast infA-62, most intron families that are broadly distributed across species (>40% of sampled genomes) belong to IIB2 lineages, including nad3-216, petB-69, cox1-874, and rns-670. In contrast, many low-frequency RT/M-encoding families (<15%) show clear signatures of structural deterioration, most commonly involving loss of DI elements such as IC2 or ID2. The association between restricted occurrence frequency and RNA scaffold degeneration is consistent with a scenario in which progressive domain loss reduces intron mobility, limits further transmission, and ultimately predisposes lineages to contraction and eventual loss from organellar genomes [3].
Across the dataset, intron families therefore appear to span a continuum of evolutionary states. At one end are structurally intact lineages with broad distributions and relatively high transmission potential. At the other are late-stage introns characterized by reduced occurrence frequency, structural degeneration, and limited mobility. These distributional patterns complement the structural and motif-level signatures discussed above and collectively support the jumping-degeneration continuum (Figure 7).
An exception to this pattern is the chloroplast infA-62 family. Despite extensive structural reduction and complete loss of an IEP, this intron is stably retained across Ulva chloroplast genomes and shows clear evidence of compositional convergence toward the host genome [28]. Rather than reflecting recent mobility, this pattern is consistent with an alternative evolutionary regime in which reduced mobility facilitates long-term fixation at a specific genomic locus, promoting host accommodation rather than repeated spread [9]. These observations indicate that occurrence frequency captures the cumulative outcome of invasion, degeneration, and retention processes acting over evolutionary time.
3.5. Contrasting Mitochondrial and Chloroplast Regimes in Group II Intron Evolution
Intron size, scaffold remodeling, IEP turnover, and GC landscapes indicate that Ulva mitochondria and chloroplasts impose distinct evolutionary regimes on group II introns, consistent with the evolutionary continuum (Figure 7). In broad terms, mitochondrial introns show signatures of higher turnover and repeated invasion-degeneration cycles, whereas chloroplast introns more often exhibit longer-term retention and, in specific cases, host accommodation.
One axis of this contrast is intron size evolution. Differences in intron length between organellar compartments can be partly attributed to IEP type. RT/M ORFs embedded within DIV contribute substantially to intron length [19], whereas LHE ORFs are considerably shorter, producing much shorter II-LHE introns [37,38]. However, coding capacity alone does not fully explain size diversity. Intrafamilial length variation is frequently dominated by large changes in DIV, consistent with progressive degeneration of IEPs as an important contributor to intron contraction over evolutionary time [1,3]. In addition, extensive remodeling of RNA structural domains contributes to size change, including pronounced reductions in specific domains (e.g., near-minimal DII in cox1-643 and rnl-2080; severely truncated DIII in atpB-627, atpI-256, orf185-47, petB-69, and petD-87) as well as localized expansions such as insertion of an Ia-like element in petB-169 and petB-277. These patterns reinforce the view of group II introns as modular RNA architectures in which scaffold remodeling and IEP degeneration jointly shape intron size and structural organization.
A second axis is compositional evolution. Mitochondrial RT/M-encoding introns show pronounced GC enrichment relative to their chloroplast counterparts, suggesting distinct compositional and selective environments for intron evolution in mitochondria versus chloroplasts [39]. In many cases, intron GC contents deviate substantially from those of surrounding host genomes, consistent with partially decoupled compositional evolution and, in some instances, with recurrent invasion or horizontal transmission histories [3,39]. GC enrichment in mitochondrial RT/M-encoding introns may also reflect structural constraints associated with maintaining stable RNA secondary and tertiary interactions required for ribozyme folding and activity, because GC-rich sequences can enhance base-pairing stability and support long-range interactions within the intron scaffold [3,28]. By contrast, the chloroplast infA-62 family exhibits GC composition closely matching the chloroplast genome, pointing to compositional amelioration toward the host genome [28]. The compositional convergence of infA-62, together with its stable retention across species, is consistent with a domestication-like evolutionary trajectory (Figure 7F), in which reduced mobility and long-term persistence allow gradual structural and compositional integration into the host genome [9,28]. Here, “domestication-like” denotes fixation and compositional accommodation rather than functional co-option or recruitment [9].
Viewed together with the structural and protein-level patterns described above, these organelle-specific contrasts support a coherent evolutionary interpretation. Mitochondrial genomes appear to favor higher intron turnover, involving repeated cycles of invasion, enzymatic replacement or degeneration, and eventual loss, whereas chloroplast genomes appear to permit longer-term retention and host accommodation in specific intron families [12]. This contrast provides a useful comparative system for considering how genomic context shapes intron persistence, degeneration, and the potential for long-term integration within host organellar genomes [9,12].
The analyses presented here establish a robust comparative system for examining organellar group II intron diversification in Ulva, but several limitations should be noted. Firstly, our inferences are based primarily on secondary-structure reconstruction, occurrence frequency, and phylogenetic analyses, and direct experimental tests of splicing efficiency and protein activity were not performed [9,11]. Secondly, divergence at the EBS-IBS interface provides a plausible explanation for target-site switching, but experimental validation of altered IBS specificity will be required to confirm this mechanism [19,33]. Finally, broader taxon sampling across Ulvophyceae is likely to refine estimates of the timing and frequency of enzymatic invasion events.
4. Materials and Methods
4.1. Dataset Construction and Family Naming of Group II Introns
Organellar genome data of Ulva species were obtained from a curated dataset previously established by our laboratory and a systematic retrieval of publicly available sequences from the NCBI GenBank database (as of 31 January 2026). A total of 44 mitochondrial and 47 chloroplast genome sequences of Ulva species were included in the final dataset. All genome accession numbers used in this study are provided in Supplementary Tables S1 and S2. To ensure the accuracy and completeness of the data, all organellar genomes were checked and reannotated (e.g., using organelle-focused annotation workflows such as MFannot) [41]. Group II introns were validated using RNAweasel (https://megasun.bch.umontreal.ca/RNAweasel/; accessed on 7 July 2025) to confirm hallmark structural features of group II introns, including a conserved DV [42,43]. Mitochondrial and chloroplast introns were named according to their host genes and insertion sites, using homologous genes from the Ulva compressa mitochondrial genome (KY626327) and chloroplast genome (MW353781) as references, respectively. The orf185-47 family, which was only detected in the chloroplast genome of Ulva tepida (OL684341), was named according to its insertion site in orf185. Introns occurring at identical insertion sites of host genes and showing sequence homology were defined as belonging to the same intron family [22,23]. Mitochondrial and chloroplast intron datasets were constructed separately, including organelle type, species name, intron content, and GenBank accession numbers. Using NCBI ORF Finder, ORFs were identified within DIV of IEP-encoding introns, and amino acid sequences were extracted for full-length RT/Ms and LHE proteins, respectively. An IEP was considered intact if a continuous ORF covered the conserved RT/M (or LHE) core regions without premature stop codons. Otherwise, it was classified as degenerated [29].
4.2. Secondary Structure Construction and Comparative Analyses of Group II Introns
Secondary structure prediction was performed to reconstruct representative intron structures and key interaction sites. To capture typical tertiary interaction motifs within each intron family, a custom R script (available upon reasonable request) was used to generate a representative sequence by selecting, at each alignment position, the nucleotide with the highest frequency among members of each intron family. RNA secondary structures were reconstructed using RNAstructure [44] and the UNAFold web server [45,46], combined with manual adjustment, with particular emphasis on the folding of the core domains (DI–DVI) and the identification of key tertiary interaction sites. Predictions focused on preserving known group II intron architectural constraints rather than de novo folding. Subsequently, structural comparison and diversity assessment were carried out at two levels. Within each intron family, secondary structures of different members were compared to identify conserved base-pairing regions and variable loop regions. Between families, we systematically compared structural characteristics to evaluate divergence in domain size, GC content, domain gain/loss events, and variation in IEPs (typically located in DIV). All predicted secondary structures were visualized using VARNA v3-93 [47]. Intron families were classified using both diagnostic secondary structural features [30] and the identities of their associated IEPs. Differences in intron length and GC content were tested using two-way ANOVA with intron lineage and organellar compartment as fixed factors, followed by Tukey’s HSD post hoc tests.
4.3. Occurrence Frequency of Group II Intron Families
To characterize intron distribution patterns, we quantified occurrence frequency at the level of the genus Ulva [28]. For each intron family, homologous searches and precise identification were performed using BLASTN, with known intron boundary sequences and/or conserved structural regions as probes [48]. BLASTN searches were performed with an E-value cutoff of 1 × 10^−5^ and minimum query coverage of 30% (or as appropriate). Presence-absence patterns across species and organellar genomes were summarized to calculate the occurrence frequency and distribution breadth for each intron family. Families present in >40% of sampled organellar genomes of Ulva species were considered broadly distributed, whereas those present in <15% were considered narrowly distributed. Insertion-site analyses were subsequently conducted by extracting ~20 nt of genomic DNA sequence upstream and downstream of each intron insertion site from the corresponding annotated host gene region. Flanking sequences from all members of the same intron family were aligned using MEGA v11.0 [49], and sequence logos were generated with WebLogo v3 to visualize nucleotide conservation and consensus patterns at insertion sites [50].
4.4. Phylogenetic Analysis Based on Conserved Intron RNA Sequences
For phylogenetic reconstruction, highly conserved regions corresponding to the catalytic core of group II intron RNAs were manually identified, extracted, and curated from all intron sequences. To reduce alignment ambiguity and minimize the impact of lineage-specific structural variability, a total of 12 mutation-hotspot regions were excluded prior to analysis. These excluded regions comprised the 5′ terminal sequence (present only in a subset of IIB-LHE introns); loop regions within domain IA; regions corresponding to domains Ia, ICa, and ICb; IC2; ID2 together with its adjacent connecting regions; IDa; the DIIIa region (restricted to certain mitochondrial introns); and loop regions within DII, DIII, and DIV. After curation, 793 nucleotide positions were retained and used for phylogenetic inference. Maximum-likelihood (ML) trees were reconstructed using IQ-TREE v3.0.1 [51]. The optimal nucleotide substitution model was selected using ModelFinder [52], which identified GTR + F + I + G4 as the best-fitting model. Branch support was evaluated using both 1000 ultrafast bootstrap replicates (UFBoot2) [53] and 1000 SH-like approximate likelihood ratio tests (SH-aLRT) [54]. Resulting phylogenetic topologies were examined to assess whether intron families cluster according to host gene identity, organelle origin, or species relationships, thereby providing insight into potential evolutionary trajectories, including vertical inheritance and horizontal transfer.
4.5. Phylogenetic Analyses Based on RT/Ms and LHEs
To evaluate evolutionary coupling between intron RNAs and their associated IEPs, phylogenetic analyses were conducted separately for RT/M and LHE proteins. Because a proportion of intron sequences showed varying degrees of degeneration in IEPs, only sequences retaining sufficient length and recognizable conserved ORFs were included. In total, 140 RT/M protein sequences and 47 LHE protein sequences were retrieved for analysis. Protein sequences were aligned at the amino acid level using MUSCLE as implemented in MEGA v11.0 [49,55]. Maximum-likelihood phylogenies were reconstructed using IQ-TREE v3.0.1 [51]. For RT/M proteins, the alignment consisted of 140 sequences spanning 681 amino acid positions. ModelFinder was used to select Q.PFAM + F + I + G4 as the optimal substitution model [52], and node support was assessed using 1000 ultrafast bootstrap replicates [53] and 1000 SH-aLRT tests [54]. For LHE proteins, 47 sequences comprising 193 aligned amino acid positions were analyzed. ModelFinder was employed to select VT + G4 as the best-fitting model [52], and support values were likewise assessed using 1000 ultrafast bootstrap replicates [53] and 1000 SH-aLRT tests [54]. To infer coevolutionary relationships, phylogenetic topologies derived from conserved intron RNA sequences were compared directly with those obtained from RT/M and LHE protein datasets. Patterns of congruence and conflict among the three trees were examined in detail to assess evolutionary coupling between intron RNA scaffolds and their IEPs and to infer key evolutionary events such as coordinated divergence, protein loss, or modular enzymatic replacement. All analyses were performed using established bioinformatic software with fixed parameters as specified, ensuring methodological reproducibility of the results.
5. Conclusions
This study presents the first genus-wide comparative analysis of organellar group II introns in the green macroalgal genus Ulva. By integrating RNA secondary structure reconstruction, intron occurrence patterns, and phylogenetic analyses of both intron RNA scaffolds and IEPs, we provide a comprehensive framework for understanding the evolutionary dynamics of these mobile ribozymes in compact organellar genomes. Our results demonstrate that Ulva organellar group II introns retain a highly conserved catalytic core while exhibiting extensive lineage-specific variation in peripheral RNA domains and associated enzymatic components. Structural analyses revealed recurrent remodeling of RNA scaffolds, particularly within DI, as well as lineage-specific innovations such as the DIIIa insertion in a subset of mitochondrial IIA1 introns. Phylogenetic analyses further support strong coevolution between intron RNAs and their IEPs and indicate that mitochondrial IIB-LHE introns originated through at least two independent acquisitions of LHE. Patterns of intron occurrence frequency and structural integrity suggest that individual intron families occupy different positions along an evolutionary continuum ranging from structurally intact and widely distributed lineages to highly degenerated, low-mobility introns. In addition, comparisons between mitochondrial and chloroplast genomes reveal contrasting evolutionary regimes, with mitochondrial introns showing more frequent structural remodeling and IEP turnover, whereas chloroplast introns more often exhibit long-term retention and, in some cases, host accommodation. These findings indicate that organellar group II introns in Ulva evolve through multiple, non-exclusive evolutionary trajectories shaped by RNA scaffold plasticity, enzymatic replacement, and organellar genomic context. The comparative framework established here highlights Ulva organellar genomes as a valuable model system for investigating the long-term evolution of mobile ribozymes in compact genomes.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Lambowitz A.M. Zimmerly S. Group II Introns: Mobile Ribozymes That Invade DNA Cold Spring Harb. Perspect. Biol.20113 a 00361610.1101/cshperspect.a 00361620463000 PMC 3140690 · doi ↗ · pubmed ↗
- 2Costa M. Group II Introns: Flexibility and Repurposing Front. Mol. Biosci.2022991615710.3389/fmolb.2022.91615735865004 PMC 9294222 · doi ↗ · pubmed ↗
- 3Zimmerly S. Semper C. Evolution of Group II Introns Mob. DNA 20156710.1186/s 13100-015-0037-525960782 PMC 4424553 · doi ↗ · pubmed ↗
- 4Toor N. Keating K.S. Taylor S.D. Pyle A.M. Crystal Structure of a Self-Spliced Group II Intron Science 2008320778210.1126/science.115380318388288 PMC 4406475 · doi ↗ · pubmed ↗
- 5Tourasse N.J. Stabell F.B. KolstøA.-B. Structural and Functional Evolution of Group II Intron Ribozymes: Insights from Unusual Elements Carrying a 3′ Extension New Biotechnol.20102720421110.1016/j.nbt.2010.02.01420219707 · doi ↗ · pubmed ↗
- 6Keating K.S. Toor N. Perlman P.S. Pyle A.M. A Structural Analysis of the Group II Intron Active Site and Implications for the Spliceosome RNA 2010161910.1261/rna.179131019948765 PMC 2802019 · doi ↗ · pubmed ↗
- 7Robart A.R. Chan R.T. Peters J.K. Rajashankar K.R. Toor N. Crystal Structure of a Eukaryotic Group II Intron Lariat Nature 201451419319710.1038/nature 1379025252982 PMC 4197185 · doi ↗ · pubmed ↗
- 8Toor N. Rajashankar K. Keating K.S. Pyle A.M. Structural Basis for Exon Recognition by a Group II Intron Nat. Struct. Mol. Biol.2008151221122210.1038/nsmb.150918953333 PMC 2735234 · doi ↗ · pubmed ↗
