MPXV RNA-seq data provide evidence for protection of viral transcripts from APOBEC3 editing
Alisa O. Lyskova, Ruslan Kh. Abasov, Anna Pavlova, Evgenii V. Matveev, Alexandra V. Madorskaya, Fedor M. Kazanov, Daria V. Garshina, Anna E. Smolnikova, Gennady V. Ponomarev, Elena I. Sharova, Dmitry N. Ivankov, Ogun Adebali, Mikhail S. Gelfand, Marat D. Kazanov

TL;DR
This study shows that mutations in monkeypox virus RNA likely come from DNA-level changes caused by human APOBEC3 enzymes, not direct RNA editing.
Contribution
The study provides evidence that APOBEC3-induced mutations in monkeypox virus are DNA-based, not RNA-based.
Findings
APOBEC signature substitutions in MPXV RNA are best explained by DNA-level mutagenesis.
Substitutions show a neutral impact on protein-coding sequences and lack RNA-specific hotspot correlations.
Observed mutations overlap with known genomic mutations in MPXV strains.
Abstract
The 2022 outbreak of monkeypox virus (MPXV), a double-stranded DNA virus, is remarkable for an unusually high number of single-nucleotide substitutions compared to earlier strains, with a strong bias toward C→T and G→A transitions consistent with the APOBEC3 cytidine deaminase activity. While APOBEC3-induced mutagenesis is well documented at the DNA level, its potential impact on MPXV RNA transcripts remains unclear. To assess whether APOBEC3 enzymes act on MPXV RNA, we analyzed RNA-seq data from infected samples. The enrichment of APOBEC signature substitutions among high-frequency mismatched positions led us to consider two possibilities: RNA editing at hotspots or fixed DNA mutations. Multiple lines of evidence support the conclusion that these substitutions arise from DNA-level mutagenesis rather than RNA editing. These include a substantial number of G→A substitutions remaining…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig 1
Fig 2
Fig 3
Fig 4
Fig 5- —RSF
- —Assignment
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoxvirus research and outbreaks · SARS-CoV-2 and COVID-19 Research · Viral Infections and Outbreaks Research
INTRODUCTION
Monkeypox disease, known for a multi-country outbreak in 2022, is caused by the monkeypox virus (MPXV), a double-stranded DNA virus belonging to the Orthopoxvirus genus in the Poxviridae family (1). MPXV was first isolated in 1958 in Copenhagen from a Macaca fascicularis imported from Singapore (2). The first human case of MPXV was documented in 1970 in the Congo Basin (3), demonstrating the virus’s capability for zoonotic transmission from animals to humans. Since then, numerous outbreaks have been reported across Eastern, Central, and Western Africa (4). The symptoms of monkeypox are similar to those of smallpox, which also belongs to the Orthopoxvirus genus along with cowpox, camelpox, mousepox, and vaccinia viruses (5). However, monkeypox typically has a lower mortality rate than smallpox (6). Non-human primates are susceptible to MPXV and can transmit the virus during outbreaks. However, there is no clear evidence supporting long-term maintenance of the virus within primate populations (7). The natural reservoir of MPXV is thought to be rodents and other small animals (8). From 2002 to 2021, several cases of MPXV reported outside of Africa were associated with imports from endemic countries in West and Central Africa (9, 10). In 2022, a rapidly expanding multi-country outbreak affected individuals with no travel history to Africa, suggesting that MPXV may have evolved to enable human-to-human transmission (11). Since the start of the outbreak, more than 90,000 cases have been reported across over 110 countries, leading the World Health Organization (WHO) to declare it a global health emergency.
MPXV is phylogenetically divided into three distinct clades: clade I, recently subdivided into Ia and Ib, represents the most virulent lineage with up to 10% mortality in humans and is predominantly transmitted in the Congo Basin; clade IIa, initially a zoonotic strain with low mortality in West Africa, later adapted to the human-to-human transmission and caused an outbreak in Nigeria during 2017–2018; and clade IIb, associated with the 2022 global outbreak, characterized by its ability to spread through human-to-human transmission (12, 13). The MPXV genome is approximately 197,000 base pairs long and contains about 200 genes (7). A relatively large number of genes are involved in the encoded replication machinery, as the entire viral lifecycle takes place in the cytoplasm of infected cells (14). The MPXV genome includes two telomeres with identical sequences that are oriented in opposite directions, known as inverted terminal repeats (15). Genes responsible for replication and morphogenesis, conserved among poxviruses, are clustered in the central region of the genome, while the genes specific to pathogenesis are located in the terminal regions (16). Studies on the transcriptional program of MPXV categorized its genes into three temporal expression phases: early, intermediate, and late (17). The estimated mutation rate for orthopoxviruses is 1–2 substitutions per genome per year (18); however, the extent of divergence in MPXV genomes from the 2022 outbreak, compared to the related 2018–2019 viruses, has reached approximately 50 single-nucleotide substitutions (13). Most substitutions were C→T transitions or their complementary G→A counterparts, typically occurring in the TC (complementary GA) contexts, respectively (13). This mutational signature, characteristic of APOBEC3 family cytidine deaminases, suggests that the accumulation of these mutations in the MPXV genome could indicate the virus’s evolved ability for human-to-human transmission (13, 19, 20).
Apolipoprotein B mRNA editing catalytic polypeptide-like (APOBEC) is a family of enzymes in mammals that play a key role in the immune system, restricting viruses and retrotransposons (21). Recently, enzymes of the APOBEC3 subfamily have received significant attention for their role in cancer mutagenesis and their potential involvement in the mutagenesis of the SARS-CoV-2 virus (22–24). APOBEC enzymes bind to RNA or single-stranded DNA, catalyzing the deamination of cytosine to uracil. In an RNA molecule, the altered nucleotide remains unchanged, as uridine is a normal RNA base and RNA is not subject to the uracil-removal repair pathways that operate on DNA. In a DNA molecule, uracil can be fixed as thymine in subsequent rounds of replication or, if converted by DNA glycosylases into an abasic site, it may be further repaired by translesion polymerases into either thymine or guanine (25). It has been observed that there exist mutagenesis hotspots of APOBEC enzymes occurring within particular secondary structure patterns of RNA or single-stranded DNA (26). Specifically, a cytosine located at the 3′ end of a hairpin loop would have a 10-fold higher mutation rate by APOBEC enzymes compared to cytosines in TC motifs outside secondary structure (27). A similar effect was observed for APOBEC signature mutations in the MPXV genome, providing additional support for the involvement of APOBEC enzymes (28). While mutations in MPXV genomes associated with the 2022 outbreak have been extensively studied, it remains unclear whether viral messenger RNAs are also affected by APOBEC enzymes.
In this study, we analyzed publicly available MPXV RNA-seq data for evidence of APOBEC mutagenesis in viral transcripts. Analysis of MPXV transcriptomes generated by RNA-sequencing can provide valuable insights into the extent and characteristics of APOBEC mutagenesis. Moreover, RNA-seq studies can generate significantly more statistical data than genome sequencing, as messenger RNAs are produced in multiple copies during gene expression, and each of these molecules can theoretically be targeted by APOBEC enzymes. Our results reveal a moderate number of APOBEC-signature substitutions in MPXV transcripts, which, upon thorough analysis, appear to stem from DNA-level mutagenesis, hence limiting support for APOBEC-mediated RNA editing.
MATERIALS AND METHODS
RNA-seq data sets (Table S1) from the following projects were obtained from the SRA archive: PRJEB60728 (host: Homo sapiens), PRJNA906618 (host: Homo sapiens), PRJEB56841 (host: Chlorocebus sabaeus), PRJNA1183318 (host: Macaca fascicularis), along with DNA sequencing data from projects PRJNA845087 and PRJNA981509 (both host: Homo sapiens). Both RNA-seq and DNA-seq data sets were downloaded using the fastq-dump utility from the SRA Toolkit (version 3.1.1) (29). The reference genome of the MPXV (GCF_014621545.1/ASM1462154v1), as well as host genomes (Homo sapiens: GRCh38.d1.v1, Chlorocebus sabaeus: GCF_015252025.1, Macaca fascicularis: GCA_037993035.1), were obtained from the NCBI RefSeq database (30).
RNA-seq data generated using long-read sequencing technologies were aligned to the reference genome using minimap2 (version 2.28-r1209) (31). Short-read RNA-seq data were aligned using STAR (version 2.7.11b) (32), and genomic DNA short reads were mapped using BWA (version 2.2.1) (33). Alignment quality metrics were computed using mosdepth (version 0.3.9) (34) and the samtools stats command from samtools (version 1.20) (35). Detection of positions with mismatches, representing either genomic variants or potential RNA editing sites, was performed using BCFtools (version 1.21; the exact command-line parameters are provided in Table S2) (35). Mismatch positions were filtered using a substitution frequency threshold of 1% for data sets generated with short-read sequencing, which is approximately ten times higher than the reported sequencing error rate (36). For data sets based on long-read sequencing, a more conservative threshold of 10% was applied due to the higher intrinsic error rate of this technology. Read counts were generated using the HTSeq package (37), and transcript abundance was subsequently quantified in TPM units using the edgeR package (38).
Secondary structure around the identified mismatch positions was predicted using RNAfold (39) and further analyzed with the RNAsselem package (version 1.3) (40). Inverted repeats were identified using Palindrome Analyzer (41). The genomic map was visualized using the circlize package in R. All other plots were generated using ggplot2 (version 3.5.1) and matplotlib (version 3.9.0).
To model the sequence specificity of APOBEC family enzymes (APOBEC3A, APOBEC3B, APOBEC3F, and APOBEC3G), we constructed position weight matrices (PWMs) based on published data reporting experimentally confirmed mutation sites and their sequence contexts (42–44). For each enzyme, we compiled nucleotide sequences flanking known editing sites from the literature and counted the frequency of each nucleotide (A, C, G, T) at each position within a defined 5-nucleotide window (–3, −2, –1, 0, 1) surrounding the mutation site, where position 0 corresponds to the mutated nucleotide. These nucleotide counts were assembled into a matrix, with each row corresponding to a nucleotide and each column to a relative position. The counts in each column were normalized to sum to 1, and the resulting frequencies were log-transformed to generate the final PWM. To score a given sequence position, we summed the logarithmic weights corresponding to the observed nucleotide at each position in the matrix. Additionally, PWMs specifically designed to distinguish APOBEC3A from APOBEC3B were adopted from (45), based on the differential preference for YTCA versus RTCA sequence motifs (the mutated C is underlined).
To assess where APOBEC-signature substitutions were enriched within each sample, we modeled the number of such substitutions using a beta-binomial distribution, which accounts for overdispersion relative to a simple binomial model. Here, an APOBEC-signature substitution was defined as a C→T substitution occurring in a TC context or the complementary G→A substitution in a GA context, consistent with the known sequence preference of APOBEC3 enzymes. Let n be the total number of detected substitutions in the sample, k the number of APOBEC-signature substitutions, and π the unknown underlying probability that a substitution belongs to the APOBEC-signature class. We assumed that π follows a beta prior: π ~ Beta(α, β), where the parameters α and β were estimated from the background (non-APOBEC-signature) substitutions across the sample. Under this model, the likelihood of observing k APOBEC-signature substitutions out of n total substitutions follows the beta-binomial distribution:
where B(,) denotes the beta function. Using the estimated parameters and , we computed the p-value for enrichment as the probability of observing k or more APOBEC-signature substitutions under the fitted beta-binomial model:
This test quantifies whether the observed proportion of APOBEC-signature substitutions exceeds the expectation defined by the background substitution process. Similarly, the beta-binomial model was used to assess enrichment of ADAR-signature substitutions, transcription strand bias, and enrichment within predicted hairpin stem regions.
RESULTS
APOBEC-signature substitutions in MPXV transcripts are observed at a limited number of positions, where they occur at high frequency
To identify RNA editing events in MPXV transcripts, we comprehensively analyzed available RNA-seq data from infected hosts, including two human (46, 47) and two macaque studies (48, 49) (hereafter referred to as RNAseq:human1–2 and RNAseq:monkey1–2), comprising a total of 47 samples. To ensure accurate detection of potential RNA editing events, we mapped sequenced reads to the MPXV genome using three complementary approaches: (i) direct mapping to the MPXV genome, (ii) mapping to the combined host and MPXV genome, and (iii) initial mapping to the host genome, followed by mapping of the unmapped reads to the MPXV genome. Following read mapping, we identified positions with mismatches using BCFtools, allowing for highly sensitive detection of substitutions. To achieve maximum specificity, we set the threshold for substitution frequency as low as possible, but substantially higher than the probability of sequencing errors (see Materials and Methods). We applied this method to three alignments generated for each RNA-seq study and derived the consensus positions (Fig. 1a).
(a) Scheme of the pipeline for detection of variant-like positions in RNA-seq data, which may reflect either genomic variants or RNA editing events. (b) Distribution of single-base substitution types detected in the sample. Bars correspond to observed counts for each substitution class, with candidate APOBEC-induced substitutions highlighted in red. The dotted yellow line marks the expected number of candidate APOBEC-induced substitutions estimated from the beta-binomial model. (c) Distribution of single-nucleotide substitutions in MPXV transcripts across all trinucleotide sequence contexts, grouped by substitution type. Each bar represents the number of observed substitutions occurring in a specific 3-nt motif, with colored panels indicating major substitution classes. (d) Substitution frequencies of all detected mismatch positions in the sample, sorted by increasing frequency. Each bar represents one genomic position, ordered on the x-axis by increasing substitution frequency (y-axis). Colors indicate the substitution class: APOBEC-signature C→T/G→A substitutions in the TC/GA motif (red), the same substitutions outside this motif (extended APOBEC signature; green), and all other substitution types (blue). (e) Enrichment of candidate APOBEC-induced substitutions among high-frequency mismatches in the sample. Bars show the proportions of APOBEC-signature (red) and other (blue) substitutions within two substitution frequency (SF) categories: high-frequency (SF ≥ 0.5) and low-frequency (SF < 0.5) sites. Horizontal dashed lines indicate the expected fraction of candidate APOBEC-induced substitutions under the beta-binomial model. P values above bars quantify the deviation from the expected substitution fractions. High-frequency mismatches show a significant enrichment of APOBEC-signature substitutions (left bar), whereas low-frequency mismatches do not show deviation from expectation (right bar).
Subsequent analysis of the detected transcript positions revealed similar results across the examined projects (Fig. S1 and S2), with Fig. 1 illustrating a representative example using a single sample, ERR11030177, from project (46). In this sample, more than 569 positions with potential RNA editing events were identified using the BCFtools method. We analyzed the fractions of substitution types at the detected positions that could be attributed to APOBEC-signature substitutions, specifically C→T or complementary G→A changes in the TC/GA context. Alternatively, we considered the same substitutions occurring outside of this context as part of the extended APOBEC signature (50). We observed a moderate enrichment of C→T and complementary G→A substitutions (Fig. 1b) and analyzed their two-nucleotide context, revealing that a significant fraction of these substitutions (P = 7.09 × 10^−8^, see Materials and Methods) had occurred within the APOBEC TC/GA motif (Fig. 1b and c). However, the analysis of distribution of substitution frequencies (SF) at detected positions (Fig. 1d) revealed that most APOBEC-signature positions have SF greater than 50% (P = 3.75 × 10^−9^), whereas positions with medium and low SF show no enrichment (P = 0.33) of APOBEC-signature substitutions (Fig. 1e). We assumed that the high frequency of APOBEC-signature substitutions at specific transcript positions could result from either pre-existing DNA-level substitutions or RNA editing hotspots within transcripts. To distinguish between these alternatives, we conducted an additional analysis related to the identified positions. For subsequent analysis, we first filtered substitution positions by substitution frequency (SF > 0.5) to retain the range enriched for APOBEC-signature substitutions and then compiled a non-redundant list of substitution positions observed across all samples in each study, indicating the number of samples in which each position occurred (File S1).
Gene-focused analysis at positions of APOBEC-signature substitutions provides evidence that a significant fraction of substitutions occurred at the DNA level
When APOBEC-signature RNA editing occurs in a transcript, non-stranded RNA sequencing can generate both direct and complementary reads, yielding both C→T and G→A substitutions in the TC/GA context. However, considering gene direction, we can assess whether suspected RNA editing leads to C→T substitutions in genes on the plus strand and G→A substitutions in genes on the minus strand of the double-stranded MPXV DNA virus. We normalized all substitutions according to the gene strand direction, preserving those in plus-strand gene regions and converting those in minus-strand gene regions to their complementary counterparts. Following this normalization, the proportion of G→A substitutions in the human data set decreased from 57.9% to 45.6% (non-human primates: from 55.4% to 53.6%), but still remained substantial (Fig. 2a; Fig. S3a). We conclude that a likely explanation for this is that a significant fraction of substitutions occurred not in RNA transcripts, but at the DNA level before transcription.
(a) Proportion of G→A substitutions with SF >0.5 identified in the RNA-seq:human1–2 data set before and after strand-based normalization of APOBEC-signature substitutions. (b) Distribution of APOBEC-signature substitutions identified in the RNA-seq:human1–2 data set across functional mutation categories, including synonymous, non-synonymous, nonsense, and intergenic substitutions.
Additional support for this hypothesis could come from estimating the effect of APOBEC-signature substitutions on the encoded amino acid sequence of transcripts. If these substitutions occur at the DNA level, deleterious substitutions may severely disturb the viral life cycle. Hence, most observed substitutions should be neutral or, at worst, mildly deleterious. We classified APOBEC-signature substitutions based on their impact on gene function and observed a depletion of non-synonymous substitutions (binomial test: human, P = 4.23 × 10^−3^; non-human primates, P = 8.01 × 10^−3^), which are often deleterious, along with an enrichment of synonymous ones (binomial test: human, P = 2.98 × 10^−3^; non-human primates, P = 1.9 × 10^−4^), which are generally considered neutral (Fig. 2b; Fig. S3b).
APOBEC-signature substitutions do not correlate with MPXV genomic features associated with transcription
To further distinguish between two hypotheses about the observed APOBEC signature substitutions—whether they result from RNA editing at transcript hotspots or from DNA-level substitutions—we analyzed the correlation between the identified genomic positions and known transcription-associated genomic features. If RNA editing predominates, we may expect the detected positions to correlate with transcriptional features, such as gene expression levels (51) or promoter regions. We analyzed the distribution of the identified positions along the MPXV genome (Fig. 3; Fig. S4) and found that APOBEC-signature substitutions, consistently detected by all three applied methods, were relatively uniformly distributed across the genome (Kolmogorov-Smirnov test: human, P = 0.44; non-human primates, P = 0.23). Then, we analyzed potential correlations between the positions and known gene categories classified according to gene expression timing. We used a well-established categorization of MPXV genes into early, intermediate, and late transcription (52) but found no statistically significant difference in the density of positions with APOBEC-signature substitutions among these gene categories (Chi-squared test: human, χ^2^ = 3.3, P = 0.5; non-human primates, χ^2^ = 4.09, P = 0.39). Among other genomic features associated with transcription in MPXV, we examined potential correlations with the positions of transcription start sites (TSS), transcription end sites (TES), and promoters, but found no statistically significant associations (human: TSS, P = 0.37; TES, P = 0.31; promoters, P = 0.13; non-human primates: TSS, P = 0.28; TES, P = 0.37; promoters, P = 0.11). Thus, our analysis of the genomic distribution of APOBEC-signature substitutions relative to transcription-linked genomic features found no significant associations.
Genomic map of identified substitution positions in the MPXV genome from two human-derived projects. The outermost track shows annotated viral genes, color-coded by expression phase: early (green), middle (red), and late (blue). The next layer inward displays the positions of identified substitutions. The height of each pin reflects the SF. Pin color indicates substitution type: APOBEC-signature (red), extended APOBEC-signature (green), and other substitutions (blue). The pinhead shape denotes strand orientation: a circle indicates a C→T substitution on the gene strand, while a square indicates the same substitution on the complementary strand. The innermost track displays the mean coverage across the genome.
Secondary structure at APOBEC-signature substitution sites lacks known APOBEC-associated hotspot patterns
Previous studies have reported that several members of the APOBEC family preferentially mutate TC motifs within specific secondary structure elements (26). An elevated frequency of APOBEC-induced mutations was observed in cytosines located at the 3′ end of hairpin loops, which can form in single-stranded DNA or RNA (27). To analyze whether the genomic positions with APOBEC-signature substitutions detected in our study are also associated with this secondary structure pattern, we applied the bioinformatics tool RNAsselem, which had been previously developed by our group (40). We extended the functionality of RNAsselem, which was initially developed for analyzing experimentally known secondary structures of RNA viruses, to support predicted secondary structures. Using RNAsselem, we analyzed elements of the predicted secondary structure at positions with APOBEC-signature substitutions and compared them to predicted secondary structure patterns at all other positions in the viral genome. We found no association of APOBEC-signature substitution sites with hairpin loop structures, nor did we observe any statistically significant differences between secondary structures at APOBEC-signature positions and at other positions in the MPXV genome (Fig. 4a).
(a) Distribution of predicted secondary structure elements at the positions of observed substitutions in RNA-seq:human1–2 data set and at all other genomic positions. No statistically significant enrichment of substitutions is observed in hairpin loops—structures previously associated with APOBEC mutational hotspots. (b) Distribution of APOBEC-signature substitutions identified in the data sets analyzed in this study and in previous works, mapped onto inverted repeat elements in the MPXV genome that are potentially capable of forming hairpin structures. No significant difference is observed between substitution sites and the rest of the genome. (c) Venn diagram showing the overlap of APOBEC-signature substitutions identified in human and monkey RNA-seq data sets analyzed in this study. (d) Venn diagram showing the intersection of APOBEC-signature substitutions detected across all RNA-seq and DNA-seq data sets analyzed in this study and mutations reported in MPXV strains from O'Toole et al. and Isidro et al.
It was recently reported that mutations observed during the 2022 MPXV outbreak are enriched in inverted repeat (IR) regions (28). However, our analysis of APOBEC-signature substitution sites within hairpin structures—which inherently include IR regions—contradicts this finding. To clarify this discrepancy, we extended our RNA-seq data set from human and non-human primate hosts by including two additional mutation sets obtained from DNA sequencing data presented in (53) (hereafter referred to as DNA-seq:human1–2), as well as two published MPXV mutation data sets from the 2022 outbreak (13, 20). We then applied the Palindrome Analyser tool used in reference 28. Our results (Fig. 4b) showed no statistically significant enrichment of APOBEC-signature substitution sites in IR regions of the MPXV genome in either the RNA-seq or DNA-seq data sets (Chi-square test: RNA-seq:human1, P = 0.45; RNA-seq:human2, P = 0.16; RNA-seq:monkey1, P = 0.5; RNA-seq:monkey2, P = 0.36; DNA-seq:human1, P = 0.62; DNA-seq:human2, P = 0.05; Isidro et al. (13), P = 0.5; O'Toole et al. (20), P = 0.38). Moreover, we did not observe any trend suggesting such enrichment.
MPXV genome positions with APOBEC-signature substitutions significantly overlap across studies, reflecting pre-existing mutations rather than RNA editing hotspots
We also compared the lists of positions with APOBEC-signature substitutions identified in the analyzed RNA-seq data sets, DNA sequencing studies, and known MPXV mutation data sets (13, 20). A substantial overlap of substitution sites was observed across different data sets (Fig. 4c). Specifically, 53 of 80 sites (66%) overlapped between two human RNA-seq studies, 48 of 69 sites (70%) between two non-human primate studies, and 44 out of 89 sites (49%) were common across all four RNA-seq data sets. Comparison with the comprehensive catalog of MPXV mutations from the 2022 outbreak (13, 20) revealed that 52 positions (58%) overlapped with those found in the RNA-seq data (Fig. 4d). Notably, 38 of 41 positions (92%) reported in the earlier study by Isidro et al. (13) were also present in the union of sites detected in RNA-seq data sets and the O’Toole catalog (20).
To determine whether the observed substitutions occurred during the experiment or were already present, information on the specific MPXV strain used in each study was required. However, among the data sets considered, only the RNAseq:human2 project (PRJNA906618) provided information on the specific MPXV strain used. For this project, we excluded from the list of detected substitutions known mutations of this strain, resulting in the exclusion of 54 out of 77 substitutions. For other projects, we excluded mutations listed in the comprehensive catalog of mutations observed in MPXV strains (20). Thus, we obtained a conservative estimate of mutations that occurred during the experiments conducted in the studies considered and were not present in the strains used (File S2). Although the number of these mutations is relatively small, there is a clear enrichment of APOBEC-signature mutations (22 out of a total of 49, P = 2.19 × 10^−8^). The distribution of these mutations across the projects is not uniform: three out of six projects contain most of the mutations, while the remaining three have very few or none.
Nucleotide context of substitutions suggests APOBEC3A or APOBEC3B as the likely enzymes responsible for APOBEC-signature mutagenesis
We further analyzed the trinucleotide context of APOBEC-signature substitutions identified in RNA-seq and DNA-seq studies (Fig. 5a). The most prevalent motif was TCG/CGA, accounting for 34% of all C→T/G→A substitutions in the human RNA-seq data, 38% in the non-human primates RNA-seq, and 35% in the DNA-seq data sets. This was followed by the TCA/TGA motif, which represented 30% of substitutions in the human RNA-seq, 28% in the non-human primates RNA-seq, and 28% in DNA-seq samples. The TCT/AGA motif was the third most common motif, contributing 20% of substitutions in both human and non-human primates RNA-seq data sets and 21% in DNA-seq data. The fourth most frequent motif, TCC/GGA, accounted for only ~7% of substitutions—substantially lower than other TC-containing motifs. Notably, this distribution differs from patterns observed in APOBEC-related cancer studies (50), where mutations in cytosines in the TCA motif are typically the most prevalent.
(a) Trinucleotide sequence context of C→T/G→A substitutions detected in RNA-seq and DNA-seq data sets from MPXV-infected human and monkey samples. (b) Correlation between the observed nucleotide context of substitutions and the known sequence preferences of APOBEC3 family enzymes. (c) Discrimination between the A3A- and A3B-associated mutational profiles using YTCA vs RTCA motif analysis, where YTCA is preferentially targeted by A3A and RTCA by A3B.
We then analyzed the extended nucleotide context of APOBEC-signature substitutions to compare sequence profiles of these substitutions with known specificity models of APOBEC family members. First, we constructed PWMs representing the sequence preferences of four APOBEC enzymes—A3A, A3B, A3G, and A3F—using available mutation data (see Materials and Methods). We then generated PWMs from the identified APOBEC-signature substitutions. The comparison of these matrices (Fig. 5b) revealed that the observed substitution profile most closely resembles the specificities of A3A and A3B. To differentiate between A3A and A3B, we conducted a YTCA vs RTCA motif enrichment analysis (45), as these motifs are well-established signatures distinguishing these two enzymes. This analysis, applied across all data sets, consistently showed an enrichment of the YTCA motif (Fig. 5c), implicating A3A as the likely source of the observed mutations.
To complement this sequence-based inference, we additionally assessed the expression levels of APOBEC family members using RNA-seq data (Fig. S5). In human infection data sets, we detected substantial expression of APOBEC3B, whereas APOBEC3A expression was essentially absent. In contrast, in the non-human primate data sets, APOBEC3A was prominently expressed, whereas APOBEC3B was also detectable but consistently at much lower levels than APOBEC3A. Thus, while at least one of the two candidate enzymes (APOBEC3A or APOBEC3B) was expressed in all infected samples, the expression patterns differed between human and monkey data sets. Taken together, although both the sequence context analysis and motif enrichment support APOBEC involvement, the heterogeneous expression profiles across data sets prevent a definitive assignment of the observed mutagenesis to either APOBEC3A or APOBEC3B alone. These results align with previous observations for MPXV mutations (28) but contradict an experimental study (54) suggesting APOBEC3F as the enzyme responsible for APOBEC-signature mutations in MPXV.
Assessment of potential ADAR-mediated RNA editing signatures
In addition to APOBEC-signature substitution patterns, we observed a pronounced excess of A→G substitutions across a substantial fraction of analyzed samples. Statistical analysis using the beta-binomial model (see Materials and Methods) demonstrated that A→G substitutions were significantly enriched in nearly half of the samples (16 out of 35 infected samples, Table S3). Since A→G substitutions are a canonical signature of RNA editing mediated by ADAR enzymes, we next investigated whether this enrichment is consistent with known molecular characteristics of ADAR activity. ADARs catalyze the deamination of adenosine to inosine in double-stranded RNA (dsRNA), which is subsequently read as guanosine during sequencing, resulting in apparent A→G mismatches. To evaluate whether the observed A→G substitutions are consistent with ADAR-mediated RNA editing, we performed two complementary analyses.
First, we assessed strand bias with respect to gene orientation. Genuine ADAR-mediated editing is expected to preferentially manifest as A→G substitutions on the coding strand, rather than as the complementary T→C substitutions, when variants are interpreted in the context of annotated transcript directions. We therefore compared the frequencies of A→G and T→C substitutions after assigning variants to the sense and antisense strands of genes. Despite the statistically significant enrichment of A→G substitutions observed in many samples, no detectable bias toward A→G substitutions on the coding strand was observed (Table S3). The relative contributions of A→G and T→C substitutions were comparable and did not show a statistically significant asymmetry.
Second, given that ADAR activity is strongly associated with dsRNA structures, we investigated whether A→G substitutions were enriched within predicted dsRNA regions. Using a previously generated set of predicted RNA hairpin structures (see Materials and Methods), substitutions were classified according to their occurrence in stem (double-stranded) or non-stem (single-stranded) regions. We then assessed whether A→G substitutions were enriched in stem regions relative to other substitution types. No enrichment of A→G substitutions within dsRNA stems was detected, and their distribution between stem and non-stem regions did not deviate from random expectation (Table S3). Taken together, these analyses indicate that, although A→G substitutions show statistically significant enrichment in nearly half of the samples, their strand distribution and structural context are inconsistent with the canonical signatures of ADAR-mediated RNA editing. This suggests that the observed A→G enrichment is unlikely to be driven by ADAR activity.
DISCUSSION
Interaction between viruses and host proteins plays a crucial role in the viral life cycle. Viruses typically do not encode all proteins required for their life cycle, relying instead on host cellular proteins to perform essential functions. In contrast to interactions with recruited host proteins, viruses also engage in a constant battle with cellular defense factors—proteins whose primary function is to restrict viral replication. When a virus circulates within a host species for an extended period, it typically evolves mechanisms to evade cellular defenses. However, zoonotic transmission to a new host is usually associated with intensive interactions between the virus and the host’s antiviral proteins. This was partially observed during the SARS-CoV-2 pandemic and has become more pronounced in the recent monkeypox outbreak. Specifically, an accelerated accumulation of mutations exhibiting the signature of the APOBEC3 subfamily of viral restriction factors was observed in monkeypox virus strains circulating in humans during the 2022 outbreak (13, 20). Given that APOBEC3 enzymes can target both ssDNA and RNA (21), and that MPXV genomes and transcripts are localized in the cytoplasm during the viral life cycle (7), it remains an open question whether MPXV transcripts are also subject to APOBEC-mediated mutagenesis. In this study, we analyzed publicly available MPXV RNA-seq data to address this issue.
We considered RNA-seq data from two human and two non-human primate studies and developed a pipeline for detecting potential RNA editing events. Our analysis yielded consistent findings across all considered data sets, revealing a moderate enrichment of APOBEC-signature substitutions in MPXV transcripts. However, the majority of these substitutions occurred at high frequencies at specific positions within transcripts, leading us to conclude that either they are fixed mutations at the DNA level or that they are RNA editing events occurring recurrently at specific hotspots. To distinguish between these two possibilities, we performed a detailed analysis of the identified APOBEC-signature substitutions. Firstly, as APOBEC3 catalyzes C→T editing in mRNA, we found that many G→A substitutions, which are complementary to C→T changes, remained even after normalization relative to the gene strand orientation. Secondly, we found that the observed substitutions were more frequently located at synonymous positions than at non-synonymous ones. Thirdly, we analyzed the distribution of these substitutions along the MPXV genome and found no correlation with the locations of genes expressed at different stages of the transcriptional program (early, intermediate, or late). Fourthly, secondary structure patterns previously associated with APOBEC mutational hotspots were not observed at the substitution sites.
Finally, we compared the sets of identified APOBEC-signature substitutions with known mutation sites in MPXV strains and found a substantial overlap. This overlap may, in part, reflect the presence of pre-existing mutations in the analyzed samples, as the specific viral strains used in the RNA-seq studies were not clearly specified. Taken together, these observations led us to conclude that the identified APOBEC-signature positions represent substitutions that occurred at the DNA level rather than RNA editing events. At the same time, we cannot completely exclude the possibility of low-level APOBEC-mediated RNA editing. RNA-seq-based detection of RNA editing is inherently limited by sequencing depth, technical error rates, and alignment uncertainty, particularly when the expected frequency of editing events approaches that of background sequencing errors. Therefore, rare or transient APOBEC-mediated RNA editing events occurring at low frequencies may remain below the detection threshold of standard RNA-seq analyses. While our results argue against widespread or biologically significant APOBEC activity at the RNA level in MPXV transcripts, they do not preclude the existence of sporadic or low-abundance editing events that are difficult to resolve with current data.
In addition to APOBEC-associated substitution patterns, we observed a statistically significant enrichment of A→G substitutions in a substantial fraction of samples. Despite this enrichment, the substitutions did not exhibit key molecular hallmarks of ADAR-mediated RNA editing, including transcription strand bias and preferential localization within double-stranded RNA structures, suggesting that ADAR activity is unlikely to underlie the observed signal. We also investigated which member of the APOBEC3 subfamily is most likely responsible for the observed DNA-level mutagenesis. Given that members of the APOBEC3 subfamily differ to some extent in their substrate specificity, we modeled the specificity via position-weight matrices using existing data for the A3A, A3B, A3F, and A3G enzymes. By comparing these models to the specificity profile derived from the observed data, we found consistently across all data sets that the sequence context of the substitutions best matches the specificity of A3A and A3B enzymes. Distinguishing between A3A and A3B specificity had been extensively considered in the context of cancer data (45), and here we applied a similar approach, identifying A3A as the likely mutator. However, expression analysis showed that the relative abundance of APOBEC3A and APOBEC3B varied across samples: in some data sets, APOBEC3B was the predominantly expressed enzyme; in others, APOBEC3A was more strongly expressed, and in a subset of samples, both enzymes were co-expressed. This heterogeneity indicates that either A3A or A3B could, in principle, contribute to the observed APOBEC-signature mutations depending on the specific infection context. The known subcellular localization of A3A and A3B is more compatible with A3A activity, as A3B is predominantly localized in the nucleus, whereas A3A has been detected in both the nucleus and cytoplasm (55).
Although members of the APOBEC family are known to restrict a wide range of viruses, including both positive- and negative-sense single-stranded RNA viruses, two distinct mechanisms may be involved: a deaminase-dependent pathway, which leads to C-to-T hypermutation, and a deaminase-independent pathway, in which APOBEC proteins inhibit viral replication through direct binding (56). The deaminase-independent mechanism has been experimentally confirmed for both DNA and RNA viruses. Although the deaminase-dependent mechanism is more frequently associated with DNA viruses and retroviruses, evidence for its occurrence in RNA viruses (57, 58) includes reports of APOBEC-signature mutations in the positive-sense RNA virus SARS-CoV-2 (59, 60).
Unlike RNA viruses, poxviruses replicate exclusively in the cytoplasm but generate single-stranded DNA intermediates during replication and repair, which represent plausible substrates for APOBEC3A and APOBEC3B. The APOBEC-associated mutational signature observed in contemporary MPXV genomes, therefore, likely reflects ongoing host-driven editing rather than replication errors intrinsic to the virus. Although the APOBEC signature does not directly determine clinical outcome, APOBEC-induced mutations may influence viral fitness, host adaptation, and lineage diversification, potentially affecting transmission dynamics during outbreaks. In this context, the enrichment of APOBEC-signature mutations in MPXV genomes provides insight into the interplay between host restriction mechanisms and viral evolution and may help explain the emergence of MPXV lineages with distinct epidemiological characteristics. Since APOBEC enzymes are known to mutate both viral DNA and RNA, the mechanisms preventing hypermutation of viral RNA transcripts remain poorly understood and require further research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Lum F-M, Torres-Ruesta A, Tay MZ, Lin RTP, Lye DC, Rénia L, Ng LFP. 2022. Monkeypox: disease epidemiology, host immunity and clinical interventions. Nat Rev Immunol 22:597–613. doi:10.1038/s 41577-022-00775-436064780 PMC 9443635 · doi ↗ · pubmed ↗
- 2Magnus P von, Andersen EK, Petersen KB, Birch‐Andersen A. 1959. A pox‐like disease in cynomolgus monkeys. Acta Pathol Microbiol Scand 46:156–176. doi:10.1111/j.1699-0463.1959.tb 00328.x · doi ↗
- 3Ladnyj ID, Ziegler P, Kima E. 1972. A human infection caused by monkeypox virus in Basankusu territory, democratic republic of the congo. Bull World Health Organ 46:593–597.4340218 PMC 2480792 · pubmed ↗
- 4Monzón S, Varona S, Negredo A, Vidal-Freire S, Patiño-Galindo JA, Ferressini-Gerpe N, Zaballos A, Orviz E, Ayerdi O, Muñoz-Gómez A, et al.. 2024. Monkeypox virus genomic accordion strategies. Nat Commun 15:3059. doi:10.1038/s 41467-024-46949-738637500 PMC 11026394 · doi ↗ · pubmed ↗
- 5Huang Y, Mu L, Wang W. 2022. Monkeypox: epidemiology, pathogenesis, treatment and prevention. Signal Transduct Target Ther 7:373. doi:10.1038/s 41392-022-01215-436319633 PMC 9626568 · doi ↗ · pubmed ↗
- 6Mc Fadden G. 2005. Poxvirus tropism. Nat Rev Microbiol 3:201–213. doi:10.1038/nrmicro 109915738948 PMC 4382915 · doi ↗ · pubmed ↗
- 7Alakunle E, Kolawole D, Diaz-Cánova D, Alele F, Adegboye O, Moens U, Okeke MI. 2024. A comprehensive review of monkeypox virus and mpox characteristics. Front Cell Infect Microbiol 14:1360586. doi:10.3389/fcimb.2024.136058638510963 PMC 10952103 · doi ↗ · pubmed ↗
- 8Karagoz A, Tombuloglu H, Alsaeed M, Tombuloglu G, Al Rubaish AA, Mahmoud A, Smajlović S, Ćordić S, Rabaan AA, Alsuhaimi E. 2023. Monkeypox (mpox) virus: classification, origin, transmission, genome organization, antiviral drugs, and molecular diagnosis. J Infect Public Health 16:531–541. doi:10.1016/j.jiph.2023.02.00336801633 PMC 9908738 · doi ↗ · pubmed ↗
