Genomic insights into the pathogenicity of a ‘Candidatus Phytoplasma asteris’ associated with Trema levigata witches’ broom disease in China
Qiao Kai, Wan Qionglian, Li Xuemei, Wang Lianchun, Su Fan, Lei Jinfu, Shangguan Muzi, Li Mei, Shahzad Munir, Cai Hong

TL;DR
This study uses a culture-free method to analyze the genome of a phytoplasma causing witches' broom disease in a plant species in China.
Contribution
The study presents a draft genome of a phytoplasma using a culture-independent approach, revealing its metabolic and pathogenic features.
Findings
The phytoplasma genome is 849.7 kb with 1356 predicted genes, 587 of which are functionally annotated.
The genome includes complete glycolysis and pyruvate metabolism pathways and specific transporters for host metabolites.
Thirty-one putative secreted proteins and three potential mobile units were identified, suggesting mechanisms for pathogenicity.
Abstract
Phytoplasma research encounters limitations due to the lack of availability of pure cultures of these microorganisms. In this study a culture-independent approach was employed to investigate the genome and pathogenic mechanisms of phytoplasma responsible for witches’ broom disease in Trema levigata (Yunnan province, China). The phytoplasma genome was assembled using Illumina sequencing data and the Phytoassembly pipeline based on mixed samples. Nested PCR analysis identified a 16Sr group I, ‘Candidatus Phytoplasma asteris’ strain in Trema levigate showing witches’ broom disease. Comparative study between infected and healthy plants resulted in an 849.7 kb draft genome with 94.1% coverage, 27.6% GC content, encoding 1356 predicted genes, of which 587 were functionally annotated. Multilocus phylogenetic analysis showed that this phytoplasma is closely related to ‘Ca. P. asteris’. The…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —Yunnan Province Local Undergraduate Universities (Partial) Basic Research Joint Special Project - Youth Project
- —Yunnan Province Local Undergraduate Universities (Partial) Basic Research Joint Special Project General Project
- —https://doi.org/10.13039/501100001809National Natural Science Foundation of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhytoplasmas and Hemiptera pathogens · Plant Virus Research Studies · Fungal Infections and Studies
Introduction
Phytoplasmas are cell-wall-deficient, obligate prokaryotic pathogens that colonize plant phloem and insect vectors [1–3]. Their unique biology have historically constrained research progress [4]. Although these pathogens are associated with diseases in hundreds of globally important agricultural and forestry crops, persistent limitations of traditional methods have left key gaps in understanding phytoplasma genome structure, pathogenicity mechanisms, and evolutionary adaptation [5]. In the Yunnan Province, a recognized biodiversity hotspot in China, it was reported phytoplasma infections affecting five native tree species and seven commercial crops [6, 7]. However, genomic characterization of indigenous phytoplasma strains in this region remains scarce.
The unique biology of phytoplasmas creates inherent research challenges. Prolonged adaptation to nutrient-abundant host environments has resulted in the loss of essential metabolic pathways, including enzymes required for the tricarboxylic acid (TCA) cycle, pentose phosphate pathway, and fatty acid biosynthesis, making in vitro culture of pure strain not yet achieved [8–10]. Furthermore, uneven distribution within host tissues complicate enrichment of their genetic materials. Conventional techniques like cesium chloride density gradient centrifugation and pulsed-field gel electrophoresis (PFGE) have achieved partial genome isolation but are limited by high costs and technical complexity [11, 12]. Metagenomic approaches leveraging next-generation sequencing offer alternatives, although the lack of host reference genomes presents significant challenges [4]. Recent bioinformatic pipelines that exploit differences in sequencing coverage between host and pathogen in Illumina data have markedly improved genome assembly efficiency for phytoplasmas infecting non experimental host species [13].
Currently available phytoplasma genomes (>30 published) show significant reduction (500–900 kb) and often contain a number of Potential Mobile Units (PMUs) [14]. These transposase-associated gene clusters may promote genomic flexibility through recombination, potentially enhancing environmental adaptability [11]. Importantly, membrane-associated and secreted effector proteins (TENGU) are confirmed as possible key virulence determinants responsible for hallmark symptoms, including witches’ broom and stunting [3, 15, 16]. Considering the likely strain-specific differences in pathogenic mechanisms, expanding the genomic repository is essential to better elucidate evolutionary and virulence patterns.
Trema levigata (Cannabaceae), a pioneering tree species known for its rapid growth and ability to thrive in nutrient-poor soils, holds significant ecological and economic value. However, a witches’ broom disease was discovered in Xinping Yi and Dai Autonomous County, Yuxi City, Yunnan Province, China, causing considerable growth reduction and plant death. Initial phylogenetic analysis using 16 S rRNA gene has identified the associated pathogen as a ‘Candidatus Phytoplasma asteris’ strain (16SrI group); however, the absence of genomic data has hindered a full understanding of its disease mechanisms [17]. This study examines the T. levigata witches’ broom agent detected in Yunnan, China by using the high-throughput Illumina sequencing coupled with the Phytoassembly bioinformatics pipeline. This was applied for the first time to this disease to assemble the phytoplasma genome directly from mixed host-pathogen samples. This culture-independent and host-reference-free approach simultaneously yields genomic data for both the host and the pathogen, providing an efficient and cost-effective strategy for phytoplasma research. Through comparative genomics and functional annotation, it was elucidated T. levigata witches’ broom phytoplasma adaptive traits and potential virulence factors were identified, thereby establishing a molecular basis for disease management while offering a methodological framework for genomic studies of uncultured phytoplasmas.
Materials and methods
Wild-growing symptomatic T. levigata samples from plants exhibiting witches’ broom disease (Fig. 1B) were collected from Pingdian Township, Xinping Yi and Dai Autonomous County, Yuxi City, Yunnan Province, China (Coordinates: 101.843422°E, 24.021594°N). Healthy T. levigata samples (Fig. 1A) were collected from Huanian Town, Eshan Yi Autonomous County, Yuxi City, Yunnan Province (Coordinates: 102.211513°E, 24.087931°N). All plant samples were identified as T. levigata by Dr. Qionglian Wan based on the Flora of China, and confirmed by comparison with voucher specimens at the Herbarium of the Institute of Botany, Chinese Academy of Sciences, Beijing city, China (voucher number: 01847045).
Fig. 1. Comparative morphology of healthy and witches’-broom symptomatic Trema levigata plants and Nested PCR detection. A: Healthy T. levigata; B: T. levigata exhibiting witches'-broom symptoms C: Nested PCR detection of phytoplasma 16S rRNA gene (Lane M: 2000 bp DNA marker; + : Positive control (Camptotheca acuminata witches'-broom phytoplasma GenBank: GCA_041276565.1); TLWB1/TLWB2: Symptomatic T. levigata samples 1/2; TL: Asymptomatic T. levigata; -: Negative control (NTC));Suppl. Fig. 1: 1% TBE Agarose gel of PCR-amplified phytoplasma DNA.,Suppl. Fig. 2: Field symptoms of Trema levigata witches' broom
Nucleic acid extraction and nested PCR Detection
Total genomic DNA was extracted from 3 g of fresh leaf tissue following grinding in liquid nitrogen, using the Omega Plant DNA Extraction Kit (Omega, Georgia, US) according to the manufacturer’s instructions. Nested PCR amplification targeting the phytoplasma 16 S rRNA gene was performed using universal primer pairs P1/P7 and R16F2n/R16R2 (nested PCR reaction) [18–21]. A laboratory-maintained Camptotheca acuminata phytoplasma sample (GenBank: GCA_041276565.1) was used as the positive control, and DEPC-treated water served as the negative control. PCR amplification was conducted using Phanta^®^ Max Super-Fidelity DNA Polymerase (Vazyme Biotech, Nanjing, China). The 25 µL reaction mixture contained: 12.5 µL of 2× Phanta Max Buffer, 0.5 µL of dNTP Mix (10 mmol·L⁻¹ each), 1 µL each of forward and reverse primers (10 µmol·L⁻¹), 0.5 µL of Phanta^®^ Max DNA Polymerase, 0.5 µg of DNA template, and an appropriate volume of DEPC-treated sterile ultrapure water to make a final volume of 25 µL. PCR products were resolved on a 1% TBE agarose gel, and amplicons were purified and subjected to Sanger sequencing (Tsingke Biotechnology, Beijing, China).
Genome sequencing and data processing
Leaf midribs were collected from three individual witches’ broom-diseased plants (positive biological replicates) and one healthy plant (negative control). Leaf midribs from PCR-positive symptomatic samples and healthy controls were dissected, flash-frozen in liquid nitrogen, and stored at −80 °C. Total genomic DNA was extracted from 0.5 g sample aliquots. Sequencing libraries were constructed and subjected to whole-genome sequencing (WGS) on an Illumina NovaSeq X Plus platform (Personalbio, Shanghai, China) with a paired-end (PE) 2 × 150 bp read configuration. The sequencing depth was 10 Gb per sample. Raw sequencing reads were quality-controlled and adapter-trimmed using FastQC (Version 0.12.1). De novo genome assembly followed the Phytoassembly pipeline (Version 0.9.2) described by Cesare et al. [13]. following sequencing and preliminary assembly of all samples, the dataset demonstrating the most robust assembly quality—based on comprehensive evaluation including contiguity (e.g., N50) and completeness—was selected for subsequent in-depth analysis.
Genome analysis
The draft phytoplasma genome was annotated using the RAST server [22](https://rast.nmpdr.org/, accessed April 1, 2025). To identify putative secreted proteins, the complete set of predicted protein sequences was analyzed as follows: SignalP v6.0 [23](https://services.healthtech.dtu.dk/services/SignalP-6.0/ accessed April 1, 2025) was used to predict signal peptides. Given that phytoplasmas are divergent Gram-positive bacteria, we selected the ‘Other’ organism group for prediction. To maximize prediction accuracy and detail, we used the ‘Slow’ model with ‘Long’ output format. This analysis was performed on the full-length protein sequences; all input sequences were greater than 10 amino acids in length and thus within the reliable prediction range of the tool. Mature protein sequences (i.e., sequences after the predicted signal peptide cleavage site) were subsequently analyzed with TMHMM v2.0 [24] to identify transmembrane domains. Proteins without predicted transmembrane domains were manually inspected to exclude those with well-characterized functions (e.g., ABC transporters); remaining candidates were annotated as putative secreted proteins.
The MOTIF search tool(https://www.genome.jp/tools/motif/MOTIF.html accessed April 1, 2025) identified protein domains associated with effectors and Potential Mobile Units (PMUs) [25]. proksee facilitated genome visualization༈https://proksee.ca/ accessed April 1, 2025༉ [26]. Orthologous single-copy sequences were identified using OrthoFinder [27]༈ver 2.5.5༉. Multiple sequence alignment employed MUSCLE༈ver 5.3༉ [28], with conserved blocks extracted using Gblocks༈ver 0.91b༉ [29]. Concatenated alignments were analyzed with ProtTest3 (ver 3.4.2) [30]to determine optimal substitution models. Maximum-likelihood phylogenetic trees were constructed using MEGA (ver 12) [31]. Metabolic pathway reconstruction utilized BlastKOALA [32]༈https://www.kegg.jp/blastkoala/ accessed April 1, 2025༉ against the KEGG database. Functional annotation was performed with eggNOG-mapper [33]༈http://eggnog-mapper.embl.de/ accessed April 1, 2025༉.
Results
On July 22, 2024, symptoms of witches’ broom were observed on T. levigata plants in Xinping County (Supplemental Fig. 2). Branches exhibiting witches’ broom symptoms were collected and processed in the laboratory, where nested PCR detection confirmed phytoplasma presence (Fig. 1C). Subsequently, both negative (healthy) and positive (infected) plant samples were submitted to Shanghai Personal Biotechnology Co., Ltd. (China) for genomic DNA library preparation and whole-genome sequencing (WGS) on an Illumina NovaSeq X Plus platform with a paired-end (PE) 2 × 150 bp read configuration (150 bp read length), achieving a sequencing depth of 10 Gb per sample. Raw data were deposited in NCBI under accession numbers SRR33445509 and SRR33445508. After quality control using FastQC, the phytoassembly pipeline was employed for genome assembly. The assembled genome size of the negative plant was 364 Mb, which served as the reference genome for subsequent analyses. The comparison between the positive plant genome and negative reference genome resulted in 1.4 Gb of non-matching reads, which were further filtered to obtain an 836 kb phytoplasma genome.
General features of the T. levigata and phytoplasma genomes
The positive sample, containing 9.8 Gb of raw sequencing data, provided 11,652× coverage of the T. levigata witches’ broom phytoplasma genome, which was assembled into 380 contigs with a total size of 849,708 bp (Fig. 2; NCBI BioSample number: SAMN48173560). BUSCO assessment revealed a draft assembly completeness of 94.1%, with a GC content of 27.6%, an N50 of 23,241 bp, and an L50 of 9. RAST annotation identified 1,356 coding sequences (CDSs), of which 587 had defined functions and 769 were hypothetical proteins (Supplementary Table 1). Additionally, 34 ribosomal sequences were annotated (16 rRNAs and 28 tRNAs).
Fig. 2. Draft genome map of the Trema levigata witches’ broom phytoplasma. The innermost and outermost rings represent mobileOG annotations for the negative and positive strands, respectively: red indicates replication/recombination/repair, yellow-green represents integration/excision, bright purple denotes transfer, and light green indicates phage. The second ring from the inside displays GC Skew. The third ring shows GC content. The fourth and sixth rings correspond to annotated features on the negative and positive strands, with blue representing CDS, dark red indicating RNA, and light purple denoting tRNA. The fifth ring illustrates the draft genome framework.This figure was generated using Proksee (accessed April 1, 2025; https://proksee.ca/)
The negative sample, with 9.6 Gb of raw sequencing data, provided 26× coverage of the T. levigata genome, which was assembled into 325,004 contigs with a total size of 362,568,904 bp (NCBI BioSample number: SAMN48908987). BUSCO assessment indicated a genome assembly completeness of 92%, with 7.3% of genes partially covered and 0.7% missing. The GC content was 33.9%, and the N50 was 13,457 bp. Augustus annotation predicted a total of 62,862 CDSs.
Phylogenetic analysis
Phylogenetic analyses were performed using 16 S rRNA sequences and 12 orthologous single-copy genes, with extended analysis of 139 orthologous genes provided enhanced taxonomic resolution and stronger nodal support. The newly identified T. levigata witches’ broom phytoplasma exhibited 95.38% ANI and 99.03% 16 S rRNA sequence identity with ‘Ca. P. asteris’(GeenBank: GCA 038505995.1), forming a well-supported monophyletic clade within the 16SrI group that included Santalum album aster yellows phytoplasma (GeenBank: GCA 018283495.1), Aster yellows witches’-broom phytoplasma (GeenBank: GCA 000012225.1), and ‘Ca. P. asteris’ (GeenBank: GCA 038505995.1) (Fig. 3, Supplementary Table 2). These results corroborate the previous findings regarding the phylogenetic positioning of this phytoplasma lineage [17].
Fig. 3. Molecular genetic analyses. A: Phylogenetic tree based on 139 homologous single-copy protein sequences; B: Phylogenetic tree based on the 16 S rRNA gene.The Acholeplasma species were included as outgroups to root the tree. Numbers on branches indicate bootstrap support values (based on 1000 replicates). The numbers at the nodes represent the number of substitutions per site. The heatmaps on the right represent pairwise nucleotide identity (B) and amino acid identity (A), respectively.In (A), Average Amino Acid Identity (AAI) values are color-coded as follows: >80% (red), > 60% (yellow), and < 60% (blue).In (B), Average Nucleotide Identity (ANI) are indicated as: >94% (red), > 88% (yellow), and < 88% (blue)
Functional classification and analysis of phytoplasma genes
COG annotation of the T. levigata witches’ broom phytoplasma genome identified 697 functionally annotated genes. The top ten functional categories were: Replication, recombination and repair (L); Posttranslational modification, protein turnover, chaperones (J); Chaperones and stress response (O); Transcription (K); Function unknown (S); Cell cycle control, division and chromosome partitioning (D); Nucleotide transport and metabolism (F); Inorganic ion transport and metabolism (P); Amino acid transport and metabolism (E); and Defense mechanisms (V) (Supplementary Tables 3 and Supplementary Fig. 1). Further annotation revealed that the 142 genes in category L primarily encoded DnaB, YqaJ, and AAA + family proteins. Category J contained 122 genes, dominated by ribosomal proteins and tRNA-related elements. The 90 genes in category O were predominantly AAA family chaperones. Category S comprised 53 genes mainly belonging to DUF families, methyltransferases, and PmbA_ family proteins. Category K included 50 genes encoding sigma factors, while category D featured 47 genes associated with DUF and CheB families. Category F contained 49 genes principally involved in thymidylate and flavodoxin functions. Category P encompassed 38 genes encoding amino acid transport, ABC transporters, lipoproteins, and cation ATPases. Category E consisted of 23 genes encoding peptidases, MCP signal proteins, and asparagine synthases. Category V comprised 15 genes encoding ABC membrane transporters, MatE multidrug efflux systems, and HsdM restriction-modification components.
KEGG annotation identified 573 genes, with the top five enriched pathways being Transporters, DNA repair and recombination proteins, Ribosome, Prokaryotic defense systems, and Transfer RNA biogenesis. Metabolic analysis revealed that T. levigata witches’ broom phytoplasma possesses a complete glycolytic core module for three-carbon compounds, pyruvate oxidation capability (converting pyruvate to acetyl-CoA via pyruvate dehydrogenase complex genes pdhA/B, DLAT, and DLD), phosphate acetyltransferase-acetate kinase pathway (acetyl-CoA to acetate conversion), and phosphatidylethanolamine (PE) biosynthesis (PA to PS to PE conversion). Reconstruction of its metabolic network (Fig. 4) confirmed retention of a complete glycolytic pathway from glucose-6-phosphate to pyruvate, with all key enzyme genes identified (pgi, pfkA, fba, TPI, gapA, pgk, gpmI, eno, pyk). Notably, it exhibited unique membrane lipid synthesis capabilities, including Sn-glycerol-3-phosphate production via gpsA-encoded dehydrogenase, CDP-diacylglycerol synthesis by CdsA, and a 1-acyl-sn-glycerol-3-phosphate pathway mediated by plsY and plsC. Complete nucleotide salvage pathways were identified, encompassing thymidylate synthesis, deoxyuridine triphosphate conversion (dut, tdk, tmk), and pyrimidine nucleotide interconversion networks (pyrH, pyrG, cmk), potentially supporting its high-frequency genomic recombination. Folate metabolism showed parasitic adaptation traits: retention of dihydrofolate reductase (folA) and dihydropteroate synthase (folP), but loss of de novo synthesis, indicating dependence on exogenous 4-aminobenzoate and pterin precursors (Fig. 4, Supplementary Table 3).
Fig. 4. An overview of the metabolic pathways in Trema levigata witches’ broom phytoplasma. The functional genome, predicted through KEGG analysis, reveals key metabolic pathways and transport proteins. Genes associated with the salvage pathway are indicated in red; genes involved in the glycolysis pathway are marked in purple. Trema levigata witches’ broom phytoplasma can fully utilize glucose-6-phosphate to generate acetyl-CoA. Genes related to the glycerophospholipid metabolism pathway are shown in dark green; genes involved in the folate biosynthesis pathway are highlighted in blue. A variety of transporter system genes—including TroA/B/D, OppB/C/D/F, LolC/E, SugC, EcfA1/A2/T, PotA/B/C/D, LysX/Y, MetQ/I/N, SecA/Y/E, YidC, EfrA/B, and EcfA1/A2/T—are labeled on the membrane
Strikingly, the metabolic network displayed high host dependency, evidenced by complete transport systems for spermidine/putrescine, lysine, and D-methionine; glutamine transporters GlnH and GlnP; bacitracin transporter ATP-binding protein BceA; ABC transporters EfrA/B; D-methionine permeases MetI/MetN; glutathione permease GsiD; EcfT/A1/A2 components; and ABCB subfamily transporters EfrA/B. Additionally, genes encoding the Sec-SRY secretion pathway—including secA (ATPase), secY (channel pore), secE (stabilizing subunit), and yidC (membrane integrase)—were identified, suggesting adaptive support for host exploitation through diverse transporter mechanisms.
Effector analysis and identification of potential mobile Units
In the genome of the T. levigata witches’ broom phytoplasma, 31 putative secreted effector proteins were identified. Among these, three mature proteins were classified as small peptides (< 10 amino acids) following the removal of their signal peptides. These were classified as secreted based on a high-confidence prediction of an N-terminal signal peptide in their precursor sequences, followed by the absence of transmembrane domains in the mature protein. Annotation revealed eight unique putative effectors, while the remaining 20 showed homology to effector proteins in other phytoplasma strains. Five were previously characterized effectors: TENGU, SAP05, SAP06/48-like, and SAP54(Supplementary Table 4).
Motif analysis identified three putative Potential Mobile Units (PMUs) ranging from 2 to 9 kb (Fig. 5 and Supplementary Table 5). Functional annotation of coding sequences within these regions showed conserved genes across all PMUs, including ATP-dependent Zn protease, DNA-binding protein HU, DNA-directed RNA polymerase specialized sigma subunit, and site-specific DNA methylase. PMU15 uniquely encoded a thymidylate kinase-like protein; PMU23 contained a single-stranded DNA-binding protein; and PMU33 featured a replicative DNA helicase-like protein.
Fig. 5. Schematic representation of potential mobile unite. Three potential mobile units (PMUs) were identified and designated as PMU15, PMU33, and PMU23. The assembly scaffold corresponding to each PMU is indicated in parentheses below its name. Numerical values on the horizontal axis denote the length of each PMU in base pairs (bp). Arrows of varying colors represent distinct genetic or functional components. A legend elucidating these components is provided on the right side of the figure
To investigate potential horizontal gene transfer (HGT) during long-term host-phytoplasma interactions, one candidate HGT sequence (E-value: 3.86E-75) was identified with homology to an aspartate aminotransferase gene, shared between the T. levigata and phytoplasma genomes. This sequence on phytoplasma scaffold_6 comprised three tandem asparagine synthetase genes. While the region also contained TRA5 and fliA annotations, no other canonical PMU marker genes were detected.
Discussion
Phytoplasmas are intracellular parasitic bacteria characterized by unique metabolic features and genetic diversity resulting from their high dependence on host organisms. Through sequencing and analysis of the T. levigata witches’ broom phytoplasma genome, this study elucidates its metabolic network, transporter systems, secreted proteins, and potential mobile units (PMUs), providing insights into its pathogenic mechanisms and evolutionary adaptations.
Streamlined methodology for phytoplasma genome assembly
The inability to date to establish reliable in vitro culture methods for phytoplasmas, discovered in 1967, continues to be a key limiting factor, impeding the procurement of high-quality DNA and advancing pathogenicity studies [34–36]. Traditionally, researchers extracted phytoplasma DNA from infected plant tissues, a challenging task due to the pathogen’s low and variable abundance (particularly in woody hosts), which necessitated complex enrichment and purification techniques such as CsCl equilibrium buoyant density gradient centrifugation (requiring bisbenzimide dyes) or pulsed-field gel electrophoresis (PFGE) for whole-chromosome isolation [37, 38].
Early genome sequencing efforts focused on ‘Ca. P. asteris’ strain OY-M [39] and ‘Ca. P. mali’ AT [38, 40]. However, the pronounced base-composition bias in the AT strain introduced specific assembly difficulties. To address this, long-read sequencing technologies and metagenomic approaches were adopted [4, 41]. The latter employs bioinformatic strategies to efficiently filter pathogen sequences from randomly sequenced DNA libraries of diseased plant samples. Notably, when constructing draft phytoplasma genomes using next-generation sequencing (NGS), the primary technical bottleneck lies in accurately identifying and separating pathogen genomic sequences from vast sequencing datasets.
Cesare Polano’s team innovatively developed the Phytoassembly bioinformatics pipeline to overcome this challenge. Built upon the IDBA-UD assembler, this workflow is optimized explicitly for mixed samples with uneven sequencing coverage, and its automated design enables rapid analysis even by researchers without specialized genomic expertise [13]. In this study, high-throughput Illumina sequencing of healthy and infected plants was performed, and the resulting data processed through this pipeline successfully generated a draft genome of T. levigata witches’ broom phytoplasma with 94.1% completeness. This methodological advancement markedly reduces the complexity of phytoplasma genome research and offers an essential groundwork for future functional investigations.
Phytoplasma genomic characteristics and structural analysis
Phytoplasma genomes generally demonstrate reductive evolution, with sizes ranging from 400 to 1,000 kbp, similar to other plant obligate pathogens such as ‘Ca Liberibacter asiaticus’ [42]. The assembled T. levigata witches’ broom phytoplasma genome (856 kbp) falls within this range. However, its assembly, derived from Illumina short-read data, consisted of 390 contigs, showing a much higher level of fragmentation than previously documented phytoplasma genomes. This discrepancy may be due to: (1) the absence of a reference genome for the host plant T. levigata, which makes subtracting host DNA more difficult; and (2) the high complexity of non-matching reads from infected samples when reference genomes from healthy plants are used.
Notably, phytoplasma genomes possess a unique chromosomal organization with unclear evolutionary origins. Studies suggest viral sequences may influence genomic architecture; Wei et al. demonstrated Caudovirales phages critically shaped ‘Ca. P. asteris’ genome [43]. Here, MobileOG-db annotation identified three phage-related sequences, though none localized to Potential Mobile Units (PMUs)—potentially due to PMU detection challenges from high fragmentation. Intriguingly, PMU23 and PMU33 harbored abundant replication/recombination/repair elements, implicating these regions in genomic plasticity.
COG annotation identified 697 genes, comparable to ‘Ca. P. ziziphin’ [44] but exceeding other sequenced strains. This discrepancy may arise from: (1) gene duplication artifacts from tandem repeats; (2) pseudogene interference; or (3) strain-specific gene expansion. These findings highlight limitations of short-read assemblies, suggesting long-read sequencing could enhance completeness and annotation accuracy for deeper insights into phytoplasma evolution and pathogenesis.
Phylogeny and metabolic adaptations
Phytoplasma taxonomy adheres to the guidelines established by the IRPCM Phytoplasma/Spiroplasma Working Team, which require a minimum of 98.65% 16 S rRNA sequence identity and 95% Average Nucleotide Identity (ANI), as well as multilocus sequence analysis (MLSA) [45, 46]. T. levigata witches’ broom phytoplasma showed 99.25% 16 S rRNA identity and 95.38% ANI with ‘Ca. P. asteris’ (GeenBank: GCA 038505995.1). Phylogenetic analysis based on orthologous single-copy genes demonstrated its closest relationship with Santalum album aster yellows phytoplasma, showing slight divergence from trees constructed using 16 S rRNA data. Nonetheless, all strains with high homology belong to the 16SrI group, which aligns with previous MLSA findings. As a ubiquitous group globally, 16SrI phytoplasmas infect T. levigata, a pioneer plant species widely distributed across Yunnan Province, China. During surveys conducted in 2022, witches’ broom symptoms were observed on T. levigata at three sites—namely Shangri-La and Yuxi—with phytoplasma presence confirmed. Further investigation is required to identify the insect vectors involved and to understand the mechanisms of pathogenesis. As obligate pathogenic prokaryotes, phytoplasmas exhibit reductive genomics and metabolic remodeling. Adapted to nutrient-rich phloem environments, they lack core metabolic pathways (e.g., TCA cycle, complete oxidative phosphorylation), instead evolving diverse energy-acquisition strategies: some strains (e.g., ‘Ca. P. asteris’ strain OY-W) amplify glycolysis via gene duplication, while others (e.g., ‘Ca. P. mali’) lose glycolytic genes and rely on alternative pyruvate synthesis [39]. It shares a similar glucose to pyruvate conversion pathway with ‘Ca. P. rubi’, which lacks the genes for inorganic pyrophosphatase and F1Fo-ATP synthase, thus rendering it incapable of establishing a complete electron transport chain. Notably, phytoplasmas maintain membrane potential, likely through transmembrane electrochemical gradients established by P-type ATPases (orthologous to eukaryotic Na^+^/K^+^- or H^+^/K^+^-ATPases). This unique energy-conversion mechanism may represent a key adaptation after the loss of standard oxidative phosphorylation. Such interspecific metabolic variations reflect adaptations to host environments, and their streamlined yet specialized energy networks provide critical insights into pathogenicity mechanisms.
Secretory pathways and effector mechanisms
Due to the lack of essential metabolic enzymes, phytoplasmas have developed various transporter systems to effectively extract vital nutrientsincluding sugars, amino acids, oligopeptides, and inorganic ionsfrom their hosts, demonstrating high metabolic dependence [47]. Studies indicate that phytoplasmas primarily utilize two secretion systems for parasitism: YidC mediates membrane protein integration, while the Sec system facilitates protein translocation and secretion into the host cytoplasm [48]. The Sec protein translocation system, essential for bacterial viability [49, 50], is well-characterized in Escherichia coli. Its core components, SecA, SecY, and SecE, are indispensable for translocation activity and cell survival; remarkably, these three proteins alone are sufficient to reconstitute translocation in vitro [51].
Conservation of the Sec system in phytoplasmas is well-established. Genes encoding SecA, SecY, and SecE were identified in ‘Ca. P. asteris’ strain OY-M [52], with SecA expression confirmed in infected plants [53]. These genes have also been reported in several other phytoplasma genomes [38, 54, 55], and secY has been cloned from multiple strains. Collectively, this evidence indicates that functional Sec systems are ubiquitous in phytoplasmas. In T. levigata witches’ broom phytoplasma, genes encoding the complete core Sec machinery (secA, secY, SecE) and yidC were identified, indicating the presence of the typical phytoplasma secretion mechanisms.
Regarding ATP-binding cassette (ABC) transporters, T. levigata witches’ broom phytoplasma exhibits distinct metabolic traits. The retention of a complete set of glycolysis genes correlates with a reduced number of transporters. Only intact spermidine/putrescine, lysine, and D-methionine transport systems were identified.
Polyamines (PAs), nitrogen-rich compounds containing multiple amine groups, play vital roles in plant development and stress responses [56]. Pathogen infection significantly induces the expression of PA metabolism genes [57], and disease-resistant cultivars accumulate higher levels of PA metabolites under stress [58]. The retention of PA transport channels in T. levigata witches’ broom phytoplasma may reflect a specialized strategy for nitrogen acquisition. Exogenous PAs could be a critical nitrogen source without complete amino acid biosynthesis pathways. Furthermore, putrescine and lysine are known to mitigate intracellular toxin accumulation [59]. This suggests that the methionine/lysine transporters facilitate nitrogen acquisition and contribute to neutralizing phytochemical toxicity. These insights advance the understanding of the nutritional adaptations underpinning host-phytoplasma coevolution.
Phytoplasma secreted proteins
Genomic analyses indicate phytoplasmas typically encode >10 secreted proteins, some characterized as effectors [60, 61]. Functionally, these effectors fall into two classes: (1) those inducing witches’ broom, leaf curling, abscission, dwarfism, and sterility by modulating plant defenses or disrupting cellular structures; and (2) those regulating plant-insect interactions to facilitate vector feeding and reproduction.
This study identified multiple known effectors in T. levigata witches’ broom phytoplasma, including TENGU, SAP05, SAP54, and SAP06/48-like. Crucially, 16SrI-group effectors (e.g., SAP11, SAP05, TENGU, SAP54) have defined mechanisms: SAP11 destabilizes TCP transcription factors to promote axillary branching [16, 62]; SAP05 degrades SPL/GATA factors via ubiquitin-independent proteasomal pathways, causing leaf malformation and delayed flowering [41, 63]; SAP54 degrades MADS-box factors via RAD23 interaction, inducing floral reversion [64]. Notably, identical effectors may function differentially across host systems. For instance, SAP11, SWP1, SJP1, and SJP2 require N-terminal nuclear localization signals and C-terminal coiled-coil domains to regulate TCP stability [65], while SAP54 from paulownia witches’ broom phytoplasma induces branching, whereas its AY-WB homolog causes leaf splitting [66, 67]. This functional diversity reflects adaptive strategies shaped by host coevolution. Additionally, five sequence-variable mosaic homologs of uncharacterized secreted proteins were identified. These potential novel host-interaction factors represent targets for elucidating phytoplasma pathogenesis. The witches’ broom symptoms in T. levigata likely arise from synergistic actions of these effectors.
Potential mobile units (PMUs)
Based on Bai et al.’s eight core genes (tra5, dnaB, dnaG, tmk, hflB, himA, ssb, rpoD), PMUs were analyzed [54]. Three PMU-like regions were identified in T. levigata witches’ broom phytoplasma, exhibiting characteristic organizational heterogeneity and gene rearrangements, aligning with their established role in genomic diversification and horizontal transfer. Contrary to the findings of Huang et al. [41], no known effectors (e.g., SAP11/SAP09) were detected in these PMUs—possibly due to assembly fragmentation. The presence of numerous hypothetical proteins within these canonical mobile genetic elements strongly suggests that they are functional components. Their co-localization with replication genes implies they may be mobilized as a unit, potentially facilitating the rapid evolution of virulence by disseminating these unknown genes across the phytoplasma population. Intriguingly, these regions were enriched in protein-synthesis enzymes (e.g., ATP-dependent Zn protease, methionyl-tRNA synthetase) co-localized with DNA replication/translation genes (dnaB, ssb), potentially enabling rapid protein synthesis during host shifts. FtsHs (encoded by the hflB gene) are membrane-associated ATP-dependent Zn proteases that degrade certain membrane proteins which have not been assembled into complexes. In ‘Candidatus Phytoplasma ziziphi’, the ATP-dependent Zn protease is significantly upregulated during infection. This upregulation may be associated with the functional role of effector proteins in the infection process [68].
A putatively horizontally transferred gene encoding asparagine synthetase an enzyme essential for nitrogen metabolism was identified on scaffold_6. This finding aligns with the previously observed abundance of amino acid transporters, highlighting the phytoplasma’s reliance on external nitrogen sources. Supporting this, Wei et al. documented an upregulation of asparagine synthetase in Solanum lycopersicum during phytoplasma infection, thereby underscoring the enzymes significance and its role in pathogenesis [69]. These findings suggest that horizontal transfer of asparagine synthetase may constitute an adaptive metabolic strategy for pathogenicity.
Limitations of the study
Although this study provides important insights into the genome of the phytoplasma associated with T. levigata witches’ broom disease, several limitations should be noted. The analysis relied on only three technical replicates and one negative control, which may limit the generalizability of the results given the sparse natural distribution of infected plants. The genome assembly is highly fragmented (380 contigs, N50 = 23,241 bp), due to the use of Illumina short-read sequencing and the lack of a reference genome for the host plant, complicating the separation of phytoplasma-derived sequences. This fragmentation increases the risk of missing genomic regions, incomplete gene annotation, and failure to detect structurally complex elements such as PMUs or effector genes. To enhance assembly continuity, future work may focus on enriching phytoplasma DNA followed by long-read sequencing using platforms such as PacBio or Oxford Nanopore.
Conclusions
Through systematic genomic analysis of a phytoplasma associated with T. levigata witches’ broom disease in Yunnan, China, this study generated a high-quality draft genome (849.7 kb, 94.1% completeness) of a novel 16SrI group ‘Ca. P. asteris’ strain (provisionally designated T. levigata witches’ broom phytoplasma). The genome exhibits classic reductive features of phytoplasmas while uniquely retaining complete glycolysis and pyruvate metabolism pathways. It additionally possesses specialized polyamine/amino acid transporter systems, suggesting an adaptive strategy involving hijacking of host nitrogen metabolism. The identification of 31 secreted proteins, three potential mobile units, and a putative horizontal gene transfer locus encoding asparagine synthetase provides new insights into the pathogenesis of phytoplasma. Methodologically, the innovative use of the Phytoassembly pipeline enabled genome resolution without pathogen purification. This establishes a molecular foundation for controlling T. levigata witches broom disease and offers a methodological framework for studying other phytoplasmas. Future research should focus on the functional validation of effectors and investigation of host-interaction mechanisms mediated by horizontally acquired genes to elucidate phytoplasma-host co-evolutionary dynamics.
Supplementary Information
Supplementary Material 1. Additional file S1: Supplementary Table S1. Supplementary Table 1: RAST annotation of the genome of Trema levigate witches’-broom phytoplasma; Supplementary Table 2 Phylogenetic Analysis data; Supplementary Table 3 Results of eggnog annotation of CDS of Trema levigate witches’-broom phytoplasma; Supplementary Table 4 potential secretory proteins; Supplementary Table 5 Result of MotifFinder PMU
Supplementary Material 2. Additional file S2: Supplementary figure S2. Supplemental Fig. 1: Original agarose gel image of PCR detection for phytoplasma; Supplemental Fig. 2: Field symptoms of witches’ broom disease on Trema levigata
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Rodrigues Jardim B, Gambley C, Tran-Nguyen LTT, Webster C, Kehoe M, Kinoti WM, Bond S, Davis R, Jones L, Pathania N et al. A metagenomic investigation of Phytoplasma diversity in Australian vegetable growing regions. Microb Genomics. 2024;10(3):001213.10.1099/mgen.0.001213 PMC 1099974638446015 · doi ↗ · pubmed ↗
- 2Zhang R-Y, Wang X-Y, Li J, Shan H-L, Li Y-H, Huang Y-K, He X-H. Complete genome sequence of candidatus Phytoplasma Sacchari obtained using a filter-based DNA enrichment method and nanopore sequencing. Front Microbiol. 2023;14:1252709.10.3389/fmicb.2023.1252709 PMC 1057729237849920 · doi ↗ · pubmed ↗
- 3Tokuda R, Iwabuchi N, Kitazawa Y, Nijo T, Suzuki M, Maejima K, Oshima K, Namba S, Yamaji Y. Potential mobile units drive the horizontal transfer of Phytoplasma effector phyllogen genes. Front Genet. 2023; 14:1132432 .10.3389/fgene.2023.1132432 PMC 1021016137252660 · doi ↗ · pubmed ↗
- 4Schneider B, Seemueller E, Smart CD, Kirkpatrick BC. E 6 - Phylogenetic classification of plant pathogenic mycoplasma-like organisms or phytoplasmas. In: Molecular and Diagnostic Procedures in Mycoplasmology. Edited by Razin S, Tully JG. San Diego: Academic Press; 1995. p. 369–380.
- 5Huang C-T, Cho S-T, Lin Y-C, Tan C-M, Chiu Y-C, Yang J-Y, Kuo C-H. Comparative genome analysis of ‘Candidatus Phytoplasma luffae’ reveals the influential roles of potential mobile units in Phytoplasma evolution. Front Microbiol. 2022;13:773608.10.3389/fmicb.2022.773608 PMC 892303935300489 · doi ↗ · pubmed ↗
- 6Zhao Y, Wei W, Davis RE, Lee I-M, Bottner-Parker KD. The agent associated with blue dwarf disease in wheat represents a new phytoplasma taxon, ‘Candidatus Phytoplasma tritici’. Int J Syst Evol Micro Biol. 2021;71:004604.10.1099/ijsem.0.00460433464199 · doi ↗ · pubmed ↗
- 7Bertaccini A, Arocha-Rosete Y, Contaldo N, Duduk B, Fiore N, Montano HG, Kube M, Kuo C-H, Martini M, Oshima K et al. Revision of the ‘Candidatus phytoplasma’ species description guidelines. Int J Syst Evol Micro Biol. 2022;72:005353 .10.1099/ijsem.0.00535335471141 · doi ↗ · pubmed ↗
- 8Majumdar R, Minocha R, Lebar MD, Rajasekaran K, Long S, Carter-Wientjes C, Minocha S, Cary JW. Contribution of maize polyamine and amino acid metabolism toward resistance against Aspergillus flavus infection and aflatoxin production. Front Plant Sci. 2019;10.10.3389/fpls.2019.00692 PMC 654301731178889 · doi ↗ · pubmed ↗
