Evolutionary and Modification Features of Two Monkeypox Virus Strains: Insights from Integrated Genomic and Epigenomic Analyses
Zhongru Zhao, Bohan Zhang, Jingwan Han, Dandan Lin, Yongjian Liu, Lei Jia, Hanping Li, Jingyun Li, Xiaolin Wang, Hongling Wen, Lin Li

TL;DR
This study explores the evolution and epigenetic features of two monkeypox virus strains to better understand their transmission and adaptation.
Contribution
The study introduces an integrated genomic and epigenomic analysis framework for monkeypox virus strains.
Findings
Two monkeypox virus isolates belong to the C.1.1 lineage and diverged into distinct clades.
APOBEC3-associated mutations dominate in both isolates, indicating a significant mutational pattern.
Epigenetic modifications like 5hmC and 6mA are enriched in ITR regions and show heterogeneity between strains.
Abstract
Since 2022, global outbreaks of monkeypox virus (MPXV) have been repeatedly designated by the World Health Organization (WHO) as a public health emergency of international concern (PHEIC), underscoring the urgent need to elucidate the multidimensional mechanisms underlying viral evolution and transmission. Current understanding remains largely focused on genomic variation, while the critical role of epigenetic regulation has been considerably overlooked. To address this gap, this study integrates high-throughput evolutionary genomic analysis with whole-genome DNA methylation profiling. Using parallel Illumina and Nanopore sequencing platforms, we comprehensively characterized two clinically derived MPXV isolates collected locally. The results revealed that both isolates belonged to the C.1.1 ancestral lineage, diverging into distinct clades (E.3 and E.4, respectively, supporting the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —National Key Research and Development Program of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoxvirus research and outbreaks · Genomics and Phylogenetic Studies · vaccines and immunoinformatics approaches
1. Introduction
Mpox is a zoonotic disease caused by the monkeypox virus (MPXV) [1,2,3,4,5] and its clinical manifestations in humans resemble those of smallpox but with substantially lower morbidity and mortality [1,6,7]. From the onset of the global mpox outbreak in May 2022 to 30 September 2025, a total of 165,210 confirmed cases and 439 deaths have been reported worldwide, spanning more than 100 countries and regions [8,9,10,11]. The World Health Organization (WHO) designated the outbreak as a Public Health Emergency of International Concern (PHEIC) twice, on 23 July 2022 [12], and 14 August 2024 [13]. During this outbreak, global scientific efforts have largely focused on applying genomic epidemiology approaches to trace transmission chains and decipher the evolutionary dynamics of the virus [7,14]. Sequencing analyses of viruses isolated from these cases have revealed that the currently circulating Clade IIb, particularly the B.1 lineage, exhibits signs of accelerated evolution, with a mutation rate markedly exceeding historical levels [14,15,16].
MPXV is a double-stranded DNA virus of the genus Orthopoxvirus (family Poxviridae), with a linear genome of approximately 197 kb encoding nearly 175 non-redundant orthologous poxvirus genes (OPGs). These OPGs participate in multiple stages of the viral life cycle, including replication, assembly, and host interaction [7,17,18]. Based on linear genomic sequences, studies have successfully constructed functional gene maps of MPXV [19,20,21,22] and systematically identified key viral virulence and immune evasion factors. For instance, the membrane fusion component OPG147 facilitates immune evasion by targeting the host MITA/STING signaling pathway to suppress innate immune responses [23]. The core protease I7, essential for viral particle maturation, has been validated as a novel target for broad-spectrum anti-orthopoxvirus drugs [24]. In vaccinia virus, different isoforms of host topoisomerase 2 (TOP2) recruited into viral factories play opposing roles—promoting viral DNA replication versus activating host defense mechanisms [25]. These findings underscore the foundational importance of linear sequence analysis in elucidating viral gene function. However, a major limitation of the current research paradigm is its overreliance on linear genomic information, while largely overlooking epigenetic modifications—a regulatory layer that may profoundly shape viral biology.
DNA methylation and other epigenetic modifications are essential mechanisms governing gene expression. In virology, 5-methylcytosine (5mC) has been established as a significant epigenetic marker influencing multiple DNA viruses’ replication cycles, latency establishment, and immune evasion strategies [26]. However, research on DNA modifications within the Poxviridae family remains comparatively underdeveloped relative to other viral taxa. Emerging evidence suggests certain poxviruses, including molluscum contagiosum virus, encode putative methyltransferase-like domains that potentially facilitate autonomous DNA modification [27], while vaccinia virus infection models demonstrate substantial remodeling of the host epigenetic landscape, including altered histone methylation patterns [28]. Furthermore, Nanopore sequencing of a misidentified camelpox vaccine strain, later shown to be vaccinia virus, has identified potential epigenetic modification signatures within its genomic architecture [29]. Collectively, these findings from diverse methodological approaches indicate that poxviruses may employ both direct and indirect mechanisms to modulate DNA modification states, suggesting the existence of a more sophisticated epigenetic regulatory network than previously recognized. Specifically regarding MPXV, comprehensive understanding of its epigenetic landscape remains substantially limited, with the potential roles of more dynamic modifications like 5-hydroxymethylcytosine (5hmC) and the prokaryotic-associated N6-methyladenine (6mA) in MPXV genomic regulation and function remaining entirely unexplored.
The recent maturation of Nanopore sequencing platforms has enabled high-resolution mapping of native DNA modifications [30,31,32,33,34] providing unprecedented capability for comprehensive investigation of viral epigenomic regulation. This study implements an integrated approach combining whole-genome variation analysis with systematic profiling of three DNA methylation modifications (5mC, 5hmC, 6mA) across two distinct MPXV strains, aiming to characterize their multi-dimensional divergence at both genetic and epigenetic levels, thereby advancing our understanding of MPXV evolutionary dynamics and pathogenic mechanisms through novel molecular perspectives.
2. Materials and Methods
2.1. Sample Sources
The two viral DNA samples used in this study were provided by the Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Beijing, China. The MPXV control samples were generated through multiplex PCR amplification targeting the viral genome, as described in detail in Section 2.3. All procedures for sample collection, processing, and analysis were strictly conducted in accordance with China’s National Guidelines for Mpox Diagnosis and Treatment (2022 Edition) [35] and the Mpox Prevention and Control Protocol [36].
2.2. MPXV Real-Time Quantitative PCR Detection
MPXV DNA was detected and quantified using a TaqMan probe-based real-time quantitative PCR assay targeting the viral F3L gene on a LightCycler^®^ 480 II Real-Time PCR System (Roche Diagnostics, Basel, Switzerland). [35]. The primers and probe sequences used were as follows: forward primer F3L-F (5′-CTCATTGATTTTTCGCGGGATA-3′), reverse primer F3L-R (5′-GACGATACTCTCCTCGTTGGT-3′), and a dual-labeled probe F3L-P (5′-[FAM]CATCAGAATCTGTAGGCCGT[MGB]-3′). The cycling protocol was as follows: 50 °C for 2 min, 95 °C for 10 min, followed by 40 cycles of denaturation at 95 °C for 30 s and annealing/extension at 60 °C for 1 min. To enable absolute quantification, a standard curve was generated using a serial dilution of an in-house prepared plasmid containing the F3L target sequence. The plasmid concentration was determined spectrophotometrically, and its copy number was calculated based on molecular weight.
2.3. Preparation and Validation of the Whole-Genome Unmodified Control
For epigenetic analysis of MPXV, a whole-genome unmodified control was established. Briefly, viral genomic DNA extracted from MPXV-infected Vero E6 cells was used as the template. To ensure unbiased coverage of the entire viral genome (~197 kb), it was divided into 59 overlapping fragments with an average length of 3.1 kb (primer sequences are provided in Table S1). All fragments were individually amplified using Premix Taq™-Ex Taq™ Version 2.0 plus dye (RR902A) (Takara Bio Inc., Shiga, Japan), under unified optimized PCR conditions consisting of an initial denaturation at 98 °C for 10 s, followed by 30 cycles of denaturation at 98 °C for 10 s, annealing at 55 °C for 30 s, and extension at 72 °C for 1 min per kb. The amplification products were confirmed by 1% agarose gel electrophoresis and purified using the Wizard^®^ SV Gel and PCR Clean-Up System (A9282) (Promega Corporation, Madison, WI, USA). The purified 59 fragments were accurately quantified using a Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and subsequently pooled at equimolar ratios to constitute the “whole-genome unmodified DNA pool”.
This control pool was validated by Oxford Nanopore direct DNA sequencing (Oxford Nanopore Technologies, Oxford, UK). Specifically, no less than 1 µg of the pooled DNA was subjected to end-repair, followed by adapter ligation to construct a standard DNA sequencing library. The library was sequenced on a MinION Mk1C platform (Oxford Nanopore Technologies, Oxford, UK). The raw signal data were base-called using Dorado v0.9.5 [37], and the resulting reads were aligned to the MPXV reference genome (NC_063383.1) using Minimap2 v2.28-r1209. Furthermore, leveraging the inherent sensitivity of Nanopore technology to DNA base modifications, subsequent analysis verified that this PCR-prepared control pool did not contain significant DNA modification signals across the entire genome, thereby validating its effectiveness as an unmodified baseline control. Detailed comparative data of modification sites between the samples and the control are provided in the Supplementary Materials (Table S2).
2.4. Whole-Genome Sequencing of MPXV
Two distinct strategies were employed for whole-genome sequencing. Strategy 1: Genomic DNA was randomly fragmented into approximately 350 bp fragments using a Covaris shearing system (Covaris, Woburn, MA, USA). Library preparation was conducted with the Rapid Plus DNA LibPrep Kit for Illumina (RK20208) (ABclonal Technology, Wuhan, Hubei, China), followed by paired-end sequencing (PE150) on the Illumina NovaSeq X Plus platform (Illumina, San Diego, CA, USA). This strategy was applied to generate high-accuracy short-read data for downstream whole-genome analyses, including single-nucleotide polymorphism (SNP) identification, lineage assignment, and phylogenetic reconstruction. Strategy 2: Viral DNA integrity and fragment size distribution were assessed using the Qsep1-Plus portable capillary electrophoresis system (BiOptic Inc., New Taipei City, Taiwan) with the S3 standard cartridge. Subsequently, the Ligation Sequencing Kit V14 (SQK-LSK114) (Oxford Nanopore Technologies, Oxford, UK) and NEBNext^®^ Companion Module v2 (E7672S) (New England Biolabs, Ipswich, MA, USA) were used for end repair and sequencing adapter ligation to construct the sequencing library. Sequencing was performed on the Oxford Nanopore Technologies MinION Mk1C platform using the R10.4.1 flow cell (FLO-MIN114) (Oxford Nanopore Technologies, Oxford, UK). The sequencing process was controlled by MinKNOW v24.06.15. This strategy was specifically designed for direct detection of DNA base modifications, leveraging Nanopore sequencing’s ability to preserve native DNA molecules and capture raw electrical signal information required for epigenetic analysis.
2.5. Whole-Genome Sequence Assembly and Mutation Analysis
The MPXV reference genome NC_063383.1 was retrieved from the NCBI database. Raw sequencing data were quality-controlled and filtered using fastp v0.24.1 [38] and FastQC v0.12.1, with the following thresholds: minimum base quality of 20, maximum allowable proportion of low-quality bases set to 30%, and minimum read length of 150 bp. Host genome contamination was subsequently removed using bowtie2 v2.5.4 [39] and bwa-mem2 v2.2.1 [40], and high-quality sequencing reads were aligned to the reference genome. Variant calling was performed with bcftools v1.22 [41] using parameters: mapping quality ≥ 25 and base quality ≥ 20. Low-confidence variants located within 500 bp of the genome termini with quality scores < 20 or sequencing depth < 10× were excluded. Finally, high-confidence variants were functionally annotated using SnpEff v5.2 [42].
2.6. Phylogenetic and Molecular Evolution Analysis
A total of 3196 complete genomes longer than 196 kb were retrieved from the NCBI Virus Database (https://www.ncbi.nlm.nih.gov/labs/virus; accessed on 25 September 2025), and 5474 high-coverage complete genomes containing all coding sequences were obtained from GISAID (https://www.gisaid.org) (data as of 25 September 2025). After merging and deduplication, sequence quality was assessed using NextClade v3.17.0 [43], resulting in 6387 high-quality genomes for the reference dataset (Table S3). Preliminary lineage identification via the NextClade online platform (https://clades.nextstrain.org; accessed on 25 September 2025) classified Sample 1 as lineage E.3 and Sample 2 as lineage E.4. For phylogenetic analysis, all E.3 and E.4 genomes from the reference dataset were extracted, and three representative sequences were randomly selected from each of the remaining lineages, combined with the two MPXV samples from this study. Multiple sequence alignment was performed using MAFFT v7.525 [44] and refined with trimAl v1.5.0 [45]. Maximum-likelihood phylogenetic trees were constructed with IQ-TREE v3.0.1 [46], using ModelFinder (implemented in IQ-TREE v3.0.1) for optimal nucleotide substitution model selection and 1000 bootstrap replicates for node support evaluation. Final visualization and annotation were conducted using the iTOL v7 online tool.
2.7. DNA Methylation Analysis
Sequencing data (POD5 files) were base-called using Dorado v0.9.5 [37] in super-accurate mode. For systematic detection of DNA methylation, two modification models were applied: [email protected]_5mC_5hmC@v3 and [email protected]_6mA@v3. Processed high-quality reads were aligned to the MPXV reference genome (NC_063383.1) using minimap2 v2.28-r1209 with preset parameters (-x map-ont). SAMtools v1.21 was used to extract alignment statistics including base distribution, sequencing quality, read length, and per-site sequencing depth. Epigenetic modification data from Dorado were analyzed using modkit v0.5.0 [47]. The 5th percentile of modification probability distribution was set as the threshold (percentiles = 0.05) for genome-wide quantification of 5mC, 5hmC, and 6mA modification frequencies, with specific evaluation of CpG motif modifications. Differential modification sites were identified using modkit’s beta distribution model to compare modification levels between MPXV samples and controls, retaining sites meeting significance thresholds (map_pvalue ≤ 0.05, mod_ratio ≥ 0.2).
2.8. Statistical Analysis
All data analyses and visualizations were performed using Python v3.8.20 and R v4.4.3.
3. Results
3.1. Summary of Sequencing Data and Analytical Approach
To gain insights into the evolutionary and methylation modification features, DNA of the two samples was extracted for further sequencing. The quality of sequencing data was then systematically evaluated. Illumina NovaSeq data from both samples met the requirements for analysis, with stable per-cycle base quality (mean > Q30, error rate < 0.1%) enabling reliable single nucleotide polymorphisms (SNPs) identification (Figure 1A). GC content and nucleotide distributions were as expected, and insert size distributions were tightly clustered, indicating minimal sequencing bias and high library preparation quality (Figure 1B–D). Viral genome coverage analysis showed average depths exceeding 280×, with >99.9% of regions covered at ≥30×, supporting low-frequency variant detection (Figure 1E). Oxford Nanopore sequencing results showed that MPXV-targeted read proportions were 0.17% (Sample 1), 1.15% (Sample 2), and 99.44% (controls), aligning with experimental expectations (Figure 2(AI–AIII)). Genome coverage depths were 74× (Sample 1), 820× (Sample 2), and 26,552× (controls), with no significant coverage gaps and uniform coverage (Figure 2(BI–BIII)). Long-read sequencing showed comparable read length distributions across samples: Sample 1 had an average read length of 727 bp (N50 = 2810 bp), Sample 2 averaged 1403 bp (N50 = 2606 bp), and control samples averaged 2839 bp (N50 = 3142 bp), consistent with expected characteristics of Nanopore sequencing (Figure 2C). Bubble plots confirmed most reads clustered in long-length/high-quality regions, verifying data suitability (Figure 2D). Our integrated approach leveraged Illumina data for error correction and SNPs detection, while Nanopore long reads enabled structural resolution and direct epigenetic modification detection. This strategy achieved gap-free genome assembly, validated genomic integrity, established phylogenetic foundations, and permitted direct detection of DNA base modifications.
3.2. Phylogenetic Characteristics of the Two MPXV Isolates
To determine the transmission origins of the two MPXV cases, we constructed a maximum likelihood phylogenetic tree incorporating two locally sequenced genomes together with 203 representative global reference genomes retrieved from GISAID and GenBank (Figure 3, Table S3). Phylogenetic analysis showed that the two locally identified strains belonged to distinct lineages: Sample 1 clustered within lineage E.3, whereas Sample 2 was assigned to lineage E.4, with both lineages descending from the ancestral C.1.1 clade (Figure 3). Notably, the two samples did not cluster together, indicating independent evolutionary histories. Specifically, Sample 1 (E.3) grouped with contemporaneous sequences sampled between late 2023 and early 2024 from multiple regions, including Brazil, the United States, the Netherlands, Germany, Portugal, and several provinces in China (e.g., Jiangsu and Guangdong). In contrast, Sample 2 (E.4) formed a well-supported clade with sequences from Australia, Hungary, Japan, India, and regions of China including Guangdong and Taiwan, most of which were collected in mid to late 2024. The clear phylogenetic separation of the two local isolates into different C.1.1-derived sublineages, together with their close relationships to geographically diverse international strains, provides evidence for at least two independent viral introduction events into the region, followed by parallel local transmission processes, rather than transmission arising from a single local source. Although the interpretation of transmission dynamics may be influenced by sampling biases in global sequence repositories and the limited number of locally sequenced genomes, these findings nonetheless offer important preliminary evidence for multiple introduction pathways and underscore the necessity of integrating expanded genomic surveillance with detailed epidemiological data to more precisely reconstruct local transmission routes.
3.3. APOBEC3-Driven Evolution and Lineage-Specific Mutations in Mpox Virus Revealed by Genomic Analysis
To characterize the micro-evolutionary features of the viral strains, we conducted whole-genome alignment of both samples against the reference strain (GenBank: NC_063383.1), which systematically revealed their genomic variation profiles. Comparative genomic analysis identified 79 and 85 SNPs in Sample 1 and Sample 2, respectively. Sample 1 contained 78 base substitutions and one deletion, while Sample 2 comprised 83 substitutions and two deletions. Phylogenetic analysis confirmed 70 shared SNPs between the two samples, supporting their common evolutionary origin. Genomic distribution analysis showed that 42 mutations in each sample were in the central core region, with the remaining 36 (Sample 1) and 43 (Sample 2) mutations distributed in terminal variable regions and inverted terminal repeats (ITRs). Functional annotation revealed 50 non-synonymous and 29 synonymous mutations in Sample 1, compared to 55 non-synonymous and 30 synonymous mutations in Sample 2 (Figure 4A). Notably, both samples exhibited prominent APOBEC3-mediated editing signatures. Sample 1 contained 67 canonical APOBEC3-driven mutations (29 TC > TT and 38 GA > AA), while Sample 2 showed 66 such mutations (27 TC > TT and 39 GA > AA) (Figure 4A), indicating substantial host immune editing pressure during viral evolution. SnpEff annotation identified 39 moderate/high-impact mutations in each sample (Figure 4A), with seven unique moderate/high-impact mutations specific to each strain (Figure 4B). Sample 1’s unique mutations were enriched in viral DNA replication and assembly genes, including OPG055, OPG056, OPG057 (encoding protein F11, EEV maturation protein, and palmytilated EEV membrane protein, respectively), OPG094 and OPG105 (encoding virion core protein D3 and the RNA polymerase-associated transcription-specificity factor RAP94, respectively), OPG164 (encoding an IEV transmembrane phosphoprotein, C142797T, p.Ser7Leu), and OPG210 (encoding a B22R family surface glycoprotein, containing a cadherin-like domain). Sample 2’s unique mutations predominantly involved transcriptional regulation and host interaction genes, including OPG023 (encoding an ankyrin repeat protein homologous to VACV D7L, C12062T, p.Arg280Lys), OPG061 (encoding a non-functional serine recombinase, similar to VACV-WR C22L and VACV-Cop F16L), OPG092 (encoding the virion assembly protein G7), OPG114 (encoding virion protein D2), OPG137 (encoding a viral membrane formation protein, C119920T, p.Arg107Cys), OPG153 (encoding an A-type inclusion protein, ortholog of VACV A26L/A30L), and OPG192 (encoding a virulence protein, ER-resident). Furthermore, sliding window analysis (1000 bp windows) revealed relatively even genome-wide variation distribution with localized clustering (Figure 4B). Sample 2 exhibited higher variation density in window 150,001–151,000 (Figure 4B), separately, sample-unique variations were identified in multiple windows. Those unique to Sample 1 included windows 20,001–21,000 and 36,001–37,000, among others, whereas Sample 2-specific windows included 9001–10,000 and 10,001–11,000 (Figure 4B). Despite localized differences, the overall variation distribution patterns remained consistent between samples (Figure 4B), supporting independent micro-evolution within a shared evolutionary framework. We conducted systematic retrieval and comparative analysis of all sequences belonging to E.3 and E.4 lineages in the reference dataset. Through comparison with the reference genome NC_063383.1, we first identified the unique key mutations carried by our two samples within their respective sub-lineages. Sample 1 (E.3 lineage) contained five unique key mutations: G36431A (OPG055, p.Ser218Leu), C38476T (OPG056, p.Glu187Lys), C39430T (OPG057, p.Asp256Asn), G73790A (OPG094, p.Gly4Glu), and C186253T (OPG210, p.Ser1633Leu). Sample 2 (E.4 lineage) carried one unique key mutation: A167098T (OPG192, p.Glu123Val). Furthermore, we comprehensively screened all lineages in Clade IIb from the reference database and constructed nucleotide variation profiles for each lineage (Figure 4C). Based on this analysis, we identified and annotated characteristic mutations specific to E.3 and E.4 lineages relative to other lineages (Figure 4C). In the E.3 lineage, prominent specific variants included C38476T (OPG056, p.Glu187Lys) and C84150T (OPG105, p.Leu1009Phe). In the E.4 lineage, we identified three specific missense mutations: G41806A (OPG061, p.His27Tyr), C94424T (OPG114, p.Asp82Tyr), and A167098T (OPG192, p.Glu123Val). Additionally, the E.4 lineage contained one high-impact nonsense mutation C72485T (OPG092, p.Glu155*) and one synonymous mutation C136791G (OPG153, p.Glu293=). Collectively, these sites are distributed across genes critical for viral replication, transcriptional regulation, and host interactions. They likely contribute to molecular evolution within each lineage and represent potential lineage-specific markers with biological significance.
3.4. Epigenetic Landscape of MPXV Reveals 5hmC/6mA Enrichment and Lineage-Specific Modification Patterns
To systematically characterize the epigenetic modification profile of MPXV, we performed comprehensive DNA methylation analysis on both viral isolates. Using PCR-amplified MPXV whole-genome sequences as an unmodified control, we identified three modification types (5mC, 5hmC, and 6mA) with the modkit tool, applying a significance threshold of p ≤ 0.05 and a mod_ratio difference ≥ 0.2 between sample and control as the cutoff for genuine modifications. Notably, no significant 5mC modifications were detected in either sample (Figure 5 and Figure 6 and Table 1).
Analysis identified 15 modification sites in Sample 1 and 17 modification sites in Sample 2 (Figure 5 and Figure 6 and Table 1). In Sample 1, one 5hmC site and fourteen 6mA sites were detected (Figure 5A,B and Table 1), with the majority of 6mA modifications located on the positive strand. Several 6mA sites were mapped to key viral genes (Table 1), including OPG001 (chemokine-binding protein), OPG145 (DNA helicase), and OPG150 (intermediate transcription factor VITF-3), as well as additional annotated genes primarily involved in viral replication, transcription, and immune modulation, whereas the remaining sites were not associated with annotated genes and may reside in intergenic regions or represent uncharacterized regulatory elements (Table 1). In Sample 2, six 5hmC sites and eleven 6mA sites were identified (Figure 6A,B and Table 1). Notably, multiple 5hmC and 6mA modifications were detected within genes related to immune evasion and transcriptional regulation (Table 1), and several 5hmC sites (1656, 2165, 195,045 and 195,554) were located within CpG islands (Figure 6A). Both samples exhibited pronounced enrichment of DNA modifications in the ITR regions, with 5hmC showing substantially stronger signals than 5mC (Figure 5 and Figure 6). To accurately interpret these signals despite the sequence identity of the two ITRs, we analyzed strand-specific patterns. Methylation peaks were concurrently present at the 5′ ends of both the positive and negative strands. Because the 5′ end of the negative strand corresponds to the 3′ physical terminus of the viral genome, this symmetric “dual 5′-end peak” pattern provides direct evidence that DNA modifications occur at both genomic ends. This observation supports the interpretation that the enrichment of modifications in the ITR regions reflects genuine biological features rather than random mapping artifacts. Comparative analysis revealed three conserved modification sites shared by both samples, including one 5hmC site (195,045) and two 6mA sites (247 and 410) (Figure 5 and Figure 6 and Table 1). In contrast, distinct modification patterns were observed between the two samples. Sample 1 contained twelve unique 6mA sites distributed across both strands (Figure 5B), whereas Sample 2 displayed a more complex modification landscape with five unique 5hmC sites and nine unique 6mA sites (Figure 6A,B and Table 1). These differences represent sample-associated variation in DNA modification patterns observed under the experimental conditions analyzed. Given the limited number of viral genomes examined, further studies incorporating multiple independently prepared viral samples will be required to determine whether these patterns reflect stable lineage-associated features or condition-dependent epigenetic variability.
4. Discussion
Since the 2022 global mpox outbreak initiated, the virus has demonstrated dynamically evolving transmission patterns, with the epidemic epicenter notably transitioning from European and American regions toward the Asia-Pacific basin during 2022–2024 as case numbers progressively increased across APAC countries [8,10,11]. China’s dual status as both a pivotal Asia-Pacific nation and one of the ten most significantly affected countries globally renders its epidemiological situation particularly crucial [11], especially considering Guangdong Province—reporting the highest case burden in mainland China—has revealed substantial MPXV molecular evolution through comprehensive genomic surveillance implemented during the outbreak since 2023 [7,14]. This context renders understanding MPXV’s evolutionary patterns and trajectory paramount, though sustained regional whole-genome surveillance reports have remained limited since 2023 [48,49,50,51,52,53,54,55], a critical knowledge gap our study addresses through high-resolution genomic analysis of local cases, providing valuable data to elucidate MPXV’s evolutionary dynamic.
Our phylogenetic analysis provides direct evidence for this transmission complexity, demonstrating that while China’s first reported September 2022 case in Chongqing belonged to lineage B.1 [56], subsequently reported viral genomes from multiple locations showed no epidemiological linkage to this index case [16,50,51,57] indicating multiple independent importation events into mainland China—a pattern our findings further substantiate by revealing that Samples 1 (E.3) and 2 (E.4), though both tracing back to ancestral C.1.1 lineage, form distinct clades strongly suggesting concurrent circulation of at least two independent transmission chains during the same period. Sample 1 clusters with reference sequences from Brazil, the United States, the Netherlands, Germany, Portugal, and China’s Jiangsu and Guangdong provinces, while Sample 2 branches with sequences from Australia, Hungary, Japan, India, and China’s Guangdong Province and Taiwan region, revealing complex cross-regional and local transmission networks consistent with national observations [58,59,60,61,62] where determining whether MPXV sequences resulted from direct overseas importation or domestic interprovincial spread remains challenging without additional epidemiological evidence. Given sampling biases in global databases and the limited sample size of our study, our conclusions represent important preliminary evidence that precisely underscores the urgency of establishing more comprehensive and representative global and local genomic surveillance systems [58,59,60,61,62].
At the molecular level, the mpox outbreak since 2022 has been characterized by accelerated mutation rates and altered transmission dynamics [8,14,63], drawing significant research attention to its underlying mechanisms. Current understanding of MPXV molecular evolution primarily encompasses three key processes: accumulation of specific mutations, viral recombination events, and evolution of codon usage preferences [16,64,65]. Particularly noteworthy is the growing evidence for APOBEC3-mediated hypermutation, whereby host antiviral proteins drive extensive viral genome editing [14,16,59,64,66]. Forni et al. [67] provided systematic evidence that such APOBEC3 editing signatures are widespread across human-infecting orthopoxviruses, including MPXV and variola virus, based on the analysis of 1624 human MPXV1 genomes. This mutational signature is characterized by a strong C-to-T/G-to-A bias with a pronounced preference for 5′-TC-3′ sequence contexts, consistent with the known editing specificity of human APOBEC3 enzymes. Notably, APOBEC3-associated mutations were found to be enriched in highly expressed viral genes, suggesting that transcription-associated exposure of single-stranded DNA may increase susceptibility to host-mediated editing. Moreover, these signatures were prominent in human-adapted viruses such as variola virus but largely absent in orthopoxviruses predominantly infecting animal hosts, strongly implicating APOBEC3 editing as a host-specific evolutionary force in humans.
Our findings provide substantial support for this mechanism, demonstrating that both viral samples accumulated numerous SNPs, with APOBEC3-mediated mutations constituting 84.8% and 77.6% of total mutations in Samples 1 and 2, respectively. The APOBEC3 enzyme family, known to be upregulated during viral infection, can suppress diverse viruses through both deaminase-dependent and independent pathways [68]. Intriguingly, sublethal APOBEC3 activity may paradoxically promote viral evolution by generating hypermutated but viable variants with novel characteristics—a phenomenon well-established in HIV pathogenesis [69] that appears equally relevant to MPXV evolution.
Our analysis further revealed high-frequency APOBEC3 signature mutations, predominantly TC > TT and GA > AA transitions, that closely match the editing preference of human APOBEC3A [70]. Given APOBEC3A’s specific expression in keratinocytes and skin tissues, this finding offers mechanistic insight into MPXV’s pronounced cutaneous tropism. The substantial diversity among APOBEC3 enzymes in their nucleotide sequence preferences, combined with their tissue-specific and species-specific expression patterns [71], likely contributes to the distinct mutational profiles observed across different transmission chains.
Importantly, recent epidemiological and genomic evidence suggests that APOBEC3-associated mutational processes were already active before the 2022 global outbreak. Ndodo et al. [72] demonstrated that multiple genetically distinct MPXV lineages were co-circulating in humans in Nigeria during 2019–2020, including the A.2 lineage, indicating sustained human-to-human transmission prior to widespread global dissemination. Their study further showed that these pre-2022 lineages continued to accumulate APOBEC3-like mutations, highlighting the long-standing role of host-driven editing during cryptic transmission. Notably, all A.2 lineage isolates harbored a lineage-defining nonsense mutation in the A46R gene that is consistent with an APOBEC3-mediated editing event, suggesting that APOBEC3 activity may directly contribute to viral host adaptation by modulating genes involved in innate immune regulation.
In this context, the global outbreak likely created compressed transmission chains that amplified the frequency and visibility of APOBEC3-associated mutations [7,62,73], with evidence suggesting the virus preferentially accumulates such mutations with minimal adaptation costs under persistent natural selection [7,64]. However, definitive attribution of the post-2022 mutational excess solely to human APOBEC3-mediated editing remains elusive, as the epidemiological complexity and APOBEC3’s potential evolutionary role introduce substantial uncertainty regarding outbreak origins and transmission pathways. Nevertheless, these observations collectively suggest that APOBEC3-mediated evolution represents a persistent process that may emerge as a dominant driver of MPXV adaptation within human populations.
While APOBEC3 activity generates substantial mutational diversity, it is the key variants subsequently fixed by natural selection that ultimately determine lineage differentiation and potential functional divergence. Within this evolutionary context, the missense mutation (p.Arg280Lys) in the OPG023 gene is of particular interest. This gene has been identified as a high-frequency mutation locus in MPXV evolution, with phylogenetic analyses indicating it is among the proteins accumulating the highest number of variants [74]. In contrast to the loss-of-function mutations (e.g., nonsense, frameshift, or large deletions) previously reported in the B.1 lineage [75] and the truncating mutation identified in the A.2.3 lineage [76], the p.Arg280Lys substitution identified in our isolate represents a conserved amino acid change located within the predicted ankyrin repeat domain. This domain typically mediates specific protein–protein interactions and plays a key role in poxviral immune modulation. Although this substitution may not completely disrupt protein folding or stability, it could subtly modulate binding affinity or specificity for host targets—such as certain adapter proteins or transcription factors within the NF-κB pathway—by altering surface electrostatics or local conformation. This “functional fine-tuning” mechanism, together with complete loss-of-function mutations, constitutes a diverse array of evolutionary strategies that OPG023 may employ during adaptation to different hosts or transmission environments. Therefore, continued surveillance and functional validation of this missense mutation will contribute to a more comprehensive understanding of the molecular evolutionary trajectory and adaptive potential of MPXV in the context of sustained human-to-human transmission.
Additionally, our systematic comparative analysis of reference datasets revealed that E.3 and E.4 lineages have each accumulated distinct sets of lineage-specific mutations. Particularly noteworthy is our identification of a high-impact nonsense mutation C72485T (OPG092, p.Glu155*) in the E.4 lineage, which may truncate the late transcription factor VLTF-1 and consequently affect viral late gene transcription and virion assembly—potentially serving as a key molecular marker for this lineage. Simultaneously, the E.4 lineage has fixed several other specific missense mutations, including G41806A (OPG061, p.His27Tyr) in the helicase gene and C94424T (OPG114, p.Asp82Tyr) in the RNA polymerase subunit gene. In contrast, the E.3 lineage demonstrates different evolutionary pathways, with its characteristic mutations such as C38476T (OPG056, p.Glu187Lys) and C84150T (OPG105, p.Leu1009Phe) enriched in DNA replication pathways. Furthermore, existing research indicates that missense mutations can provide adaptive advantages for viruses and may reflect viral adaptation processes to human hosts [7]. The documented preference of MPXV for accumulating nonsynonymous mutations during evolution [7] is consistent with our findings, which show that Sample 1 and Sample 2 contained 50 and 55 nonsynonymous mutations, respectively, with the aforementioned lineage-specific mutations being predominantly missense—thereby supporting their potential adaptive functions. Collectively, these lineage-specific mutations demonstrate non-random genomic distribution, instead showing precise localization within key genes intimately involved in viral replication, transcriptional regulation, and host interactions. This pattern indicates that under shared APOBEC3 editing pressure, E.3 and E.4 lineages have embarked on parallel adaptive evolutionary trajectories by accumulating different adaptive mutations across various functional genes. These mutations not only possess potential utility as lineage markers for molecular tracing but, more significantly, may constitute the molecular foundation for understanding potential phenotypic differences between lineages.
The most exploratory dimension of our investigation emerges from comprehensive DNA methylation analysis of MPXV genomes—representing the first systematic characterization of the DNA modification landscape in clinically isolated MPXV strains. Our methodological approach employed PCR-amplified MPXV whole genomes as unmodified controls and utilized the modkit tool with stringent thresholds (p ≤ 0.05, mod_ratio difference ≥ 0.2) to minimize false-positive signals. Within this rigorous analytical framework, we detected no significant 5mC modifications in either sample but consistently identified both 5hmC and 6mA sites. This distinctive modification profile suggests that MPXV may possess epigenetic regulatory characteristics distinct from host chromatin architecture.
The absence of detectable 5mC modifications in our study may reflect intrinsic characteristics of the MPXV genome. Beyond 5mC, both 5hmC and 6mA represent equally crucial DNA modifications with significant regulatory functions [77]. Our findings indicate that the MPXV genome appears resistant to stable 5mC modification by host DNA methyltransferases, while the consistent detection of 5hmC signals suggests viral DNA exposure to host oxidative modification environments. Since 5hmC is primarily catalyzed by TET family dioxygenases [32], its presence in the MPXV genome likely results from host TET enzyme activity on viral DNA [78], revealing intricate interactions between the virus and the host epigenetic machinery [77].
Particularly noteworthy is our identification of multiple 6mA sites within the MPXV genome. As an evolutionarily conserved and dynamic DNA mark [79], 6mA has been demonstrated to influence chromatin architecture through mechanisms including nucleosome positioning in eukaryotic systems [80]. Its detection in MPXV suggests that analogous regulatory mechanisms may operate on viral genomes, with the widespread presence of 6mA and its specific regulatory enzymes in the human genome providing mechanistic plausibility for this phenomenon [81]. Furthermore, observations of dynamic 6mA accumulation in zebrafish and porcine early embryos [82] indicate this modification can achieve elevated levels in rapidly proliferating cellular environments, offering a valuable analogy for understanding 6mA detection in fast-replicating viral contexts.
Our study observed a significant enrichment of 5hmC and 6mA modifications within the inverted terminal repeat (ITR) regions of the viral genome. This specific spatial distribution pattern may carry multifaceted biological significance. Firstly, ITR regions play a crucial role in poxviral replication and transcription initiation, harboring core sequences essential for replication origin and early transcriptional regulatory elements [83]. The clustering of epigenetic modifications within these regions suggests their potential involvement in the fine-tuning of these fundamental processes. For instance, existing research suggests that 6mA can influence nucleosome positioning by modulating DNA flexibility or steric hindrance [80,84]. Based on this, we hypothesize that 6mA modifications within the ITRs may, through similar mechanisms, alter local DNA conformation or accessibility. This could, in turn, affect the assembly or binding efficiency of the replication machinery, such as DNA polymerase complexes, potentially modulating the timing or efficiency of replication initiation. Secondly, this symmetrical enrichment pattern at both genomic termini (the “dual 5′-end peak” pattern) may reflect structural or functional specialization. The secondary structures (e.g., hairpin loops) formed by ITRs are key features of poxvirus genome termini. Modified bases could influence the stability of these structures or their interactions with host factors. While direct evidence in MPXV is lacking, the symmetry of the pattern implies it is likely an ordered, non-random event. Future studies involving the construction of viral mutants with specific modification site abrogations, combined with functional assays, will be necessary to directly validate the precise roles of these epigenetic marks in MPXV replication, transcription, or immune evasion. Our study identified both conserved and sample-specific DNA modification patterns across the analyzed MPXV genomes, with Sample 2 (E.4 lineage) exhibiting a relatively more complex modification profile under comparable culture conditions. This observation suggests the presence of lineage-associated differences in viral DNA modification patterns and highlights the potential epigenomic complexity of MPXV. Notably, previous Nanopore-based studies have reported putative epigenetic modifications in a virus initially misidentified as a camelpox vaccine strain and later confirmed to be vaccinia virus [29], supporting the technical feasibility of long-read sequencing approaches for detecting viral DNA modifications in orthopoxviruses. In addition, accumulating evidence indicates that DNA 6mA and RNA m^6^A modifications may participate in host–pathogen interactions [85], underscoring the potential biological relevance of viral epigenomic features. Nevertheless, given that the present analysis is based on a limited number of viral DNA samples prepared under specific experimental conditions and lacks multiple independently generated biological replicates, these findings should be interpreted as lineage-associated observations rather than definitive evidence of intrinsic epigenetic divergence or lineage-specific adaptive evolution. The stability, generalizability, and functional significance of these modification patterns therefore remain to be established. Future studies incorporating multiple biological replicates and standardized virus preparation protocols will be essential to substantiate these observations. In addition, orthogonal technical validation using approaches such as PacBio single-molecule real-time sequencing, 5hmC chemical labeling (5hmC-Seal), or 6mA immunoprecipitation (6mA-DIP), together with mechanistic investigations, functional analyses through site-specific mutagenesis, and expanded clinical correlation studies, will be critical for clarifying the biological significance of MPXV epigenomic variation.
In summary, this multi-level investigation integrating genomic variation and epigenetic profiling has revealed distinct characteristics between two MPXV strains, including genetic signatures, APOBEC3 editing preferences, and lineage-associated differences in observed 5hmC/6mA modification patterns under the examined experimental conditions, providing new evidence for understanding MPXV’s continuous adaptive evolution in human populations. Particularly, the modification enrichment in ITR regions, emergence of 5hmC signals, and lineage-specific 6mA sites collectively suggest that viral DNA may be subject to host epigenetic regulation. Nevertheless, it should be noted that this study is based on a limited number of clinical isolates, and the observed patterns require further validation in larger and more diverse MPXV cohorts. Future studies should validate these findings in larger cohorts while employing cell culture models or in vitro infection systems to determine the origin of these modifications and their functional roles in viral replication, gene expression regulation, and transmissibility. Concurrently, further exploration of potential interplay between APOBEC3 editing pressure and DNA modifications will help elucidate the contribution of host immunity to MPXV evolution. Deeper investigation of these aspects will facilitate construction of a comprehensive MPXV epigenomic landscape and provide novel theoretical foundations for developing enhanced viral surveillance and control strategies.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Mc Collum A.M. Damon I.K. Human monkeypox Clin. Infect. Dis.20145826026710.1093/cid/cit 70324158414 PMC 5895105 · doi ↗ · pubmed ↗
- 2European Centre for Disease Prevention and Control Risk Assessment: Monkeypox Multi-Country Outbreak Available online: https://www.ecdc.europa.eu/en/publications-data/risk-assessment-monkeypox-multi-country-outbreak(accessed on 8 November 2025)
- 3Bunge E.M. Hoet B. Chen L. Lienert F. Weidenthaler H. Baer L.R. Steffen R. The changing epidemiology of human monkeypox—A potential threat? A systematic review P Lo S Negl. Trop. Dis.202216 e 001014110.1371/journal.pntd.001014135148313 PMC 8870502 · doi ↗ · pubmed ↗
- 4World Health Organization Monkeypox Fact Sheet Available online: https://www.who.int/news-room/fact-sheets/detail/monkeypox(accessed on 8 November 2025)
- 5Ulaeto D. Agafonov A. Burchfield J. Carter L. Happi C. Jakob R. Krpelanova E. Kuppalli K. Lefkowitz E.J. Mauldin M.R. New nomenclature for mpox (monkeypox) and monkeypox virus clades Lancet Infect. Dis.20232327310.1016/S 1473-3099(23)00055-536758567 PMC 9901940 · doi ↗ · pubmed ↗
- 6Ježek Z. Szczeniowski M. Paluku K.M. Mutombo M. Human monkeypox: Clinical features of 282 patients J. Infect. Dis.198715629329810.1093/infdis/156.2.2933036967 · doi ↗ · pubmed ↗
- 7Zhang S. Wang F. Peng Y. Gong X. Fan G. Lin Y. Yang L. Shen L. Niu S. Liu J. Evolutionary trajectory and characteristics of Mpox virus in 2023 based on large-scale genomic surveillance in Shenzhen, China Nat. Commun.202415745210.1038/s 41467-024-51737-439198414 PMC 11358148 · doi ↗ · pubmed ↗
- 8Kraemer M.U.G. Tegally H. Pigott D.M. Dasgupta A. Sheldon J. Wilkinson E. Schultheiss M. Han A. Oglia M. Marks S. Tracking the 2022 monkeypox outbreak with epidemiological data in real time Lancet Infect. Dis.20222294194210.1016/S 1473-3099(22)00359-035690074 PMC 9629664 · doi ↗ · pubmed ↗
