A pentose, as a cytosine nucleobase modification in Shewanella phage Thanatos genomic DNA, mediates enhanced resistance toward host restriction systems
David Brandt, Anja K. Dörrich, Marcus Persicke, Alina Kemmler, Tabea Leonhard, Markus Haak, Sophia Nölting, Matthias Ruwe, Nicole Schmid, Kai M. Thormann, Jörn Kalinowski

TL;DR
A new cytosine modification in a Shewanella phage helps it resist host defense systems, including CRISPR/Cas enzymes.
Contribution
Discovery of a novel deoxypentose modification on cytosine in phage DNA that confers resistance to restriction enzymes and CRISPR/Cas systems.
Findings
Shewanella phage Thanatos DNA is resistant to multiple restriction enzymes and Cas cleavage.
A deoxypentose modification on cytosine was identified as a novel DNA modification in phage DNA.
The modification is essential for phage propagation and likely aids in evading host defense systems.
Abstract
Co-evolution of bacterial defense systems and phage counter-defense mechanisms has resulted in an intricate biological interplay between bacteriophages and their prey. In order to evade nuclease-based mechanisms that target DNA, various bacteriophages modify their nucleobases, which impedes or even inhibits the recognition and restriction by endonucleases. We found that Shewanella phage Thanatos DNA is insensitive to multiple restriction enzymes and also to Cas I-Fv and Cas9 cleavage. Furthermore, with nanopore sequencing, the phage DNA showed severely impaired basecalling. In addition to an adenine methylation, the data indicated an additional, much more substantial nucleobase modification. Using liquid chromatography-mass spectrometry (LC-MS), we identified an unknown configuration of a deoxypentose attached to cytosine as an undiscovered modification of phage DNA, which is present in…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig 1
Fig 2
Fig 3
Fig 4
Fig 5
Fig 6| Locus tag | Annotated gene function | EC number | Amino acid length |
|---|---|---|---|
| TH1_075 | dCTP pyrophosphatase | – | 173 |
| TH1_119 | dCMP deaminase | 3.5.4.12 | 174 |
| TH1_062 | Deoxycytidylate 5-hydroxymethyltransferase/ | – | 237 |
| TH1_060 | Hypothetical protein with glycosyltransferase and phosphotransferase domains | – | 561 |
- —European Regional Development Fundhttp://dx.doi.org/10.13039/501100008530
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacteriophages and microbial interactions · Insect symbiosis and bacterial influences · Bacterial Genetics and Biotechnology
INTRODUCTION
Bacteriophages employ a large arsenal of trickery to counter the various bacterial defense systems. It is well established that phages possess modified DNA nucleobases since the discovery of epigenetic modifications in T-even bacteriophages (1). To date, multiple chemical base modifications and the respective enzymatic pathways, which are normally phage-encoded, have been identified and characterized (2). Alternative bases can be identified and analyzed using mass spectrometric methods as liquid chromatography-mass spectrometry (LC-MS) on digested DNA (2, 3) or, to a limited extent, by direct sequencing of native DNA using third-generation sequencing techniques like SMRT or nanopore sequencing (4–7).
One of the most extensively studied phages, Escherichia coli phage T4, regularly adds a hydroxymethyl group to the C-5 of its cytosine residues (hmC), which acts as a protection against bacterial anti-viral defense mechanisms. A common modification pathway in T-even bacteriophages further involves the glycosylation of hydroxymethylated deoxycytosine (dC), where up to two glucose residues are added to hmC by α- and β-glycosyltransferases in the T4 phage (2). This particular covalent DNA modification alone provides resistance to E. coli restriction-modification (R-M) systems and, in addition, the T4 DNA modification also decreases the efficiency of CRISPR-Cas9 cleavage, which, however, probably depends on the composition of the targeting and tracr RNAs (8, 9). As a second line of defense, E. coli T4 possesses a DNA adenine methyltransferase, which methylates GATC sites and also protects the phage DNA from nuclease cleavage (10–12). Further epigenetic sugar modifications of cytosine by arabinose-derived molecules have been identified in phages targeting E. coli and other Gram-negative species (13, 14) (see Fig. S1). As the canonical dC decoration found in T4, these modifications provide a protection against multiple DNA-targeting phage-defense systems of the host (14). These discoveries strongly indicate that more DNA modifications exist that are yet elusive.
The lytic phage Shewanella phage Thanatos-1 belongs to the Tevenvirinae and shares the typical head-tail morphology (15). The phage has a dsDNA genome of 160.6 kbp encoding about 206 proteins, many of which are of unknown function. An apparent recalcitrance of the phage DNA toward restriction indicated that one or more of the nucleobases may be modified (15). Here, we investigated Shewanella phage Thanatos with regard to modified DNA nucleobases using nanopore sequencing, as well as LC-MS. We characterized a DNA methyltransferase and identified a so far non-described cytosine modification, which acts as an efficient countermeasure against host nuclease-based defense systems targeting the phage DNA.
RESULTS
Shewanella phage Thanatos-1 dsDNA is resistant to restriction endonucleases, as well as Cas9 and Cas-IF cleavage
Shewanella phage Thanatos-1 (Thanatos in the following) was isolated using S. oneidensis MR-1 as host organism and was hence able to infect and lyse this species (15). S. oneidensis possesses a type II R-M, which has been shown to decrease the plasmid transformation rate (16), which was obviously not highly efficient toward infection by Thanatos. To determine if Shewanella phage Thanatos is generally susceptible to restriction by type II restriction enzymes, we used Thanatos DNA as substrate for various commercially available enzymes. An exemplary assay is shown in Fig. S2. We found that the Thanatos DNA was resistant to restriction toward a range but not all of the type II restriction enzymes. Apparent digestion only occurred for enzymes, where no C/G is part of the cut site (EcoRV) or is present in the recognition motif (SspI).
Another nuclease-based phage defense system that is widely present among bacteria and archaea is the CRISPR-Cascade system (17, 18). Here, the nuclease complex is specifically guided toward potentially invading nucleic acids via short (cr)RNA fragments (19). To determine if Thanatos is susceptible to this phage-defense system, and to potentially set up a genetic system to modify the Thanatos genome in vitro, we employed two different types of CRISPR-Cas systems. One was the I-Fv system from the closely related S. putrefaciens CN-32 (20), while the second was based on the paradigm Cas9 system from Streptococcus pyogenes (21). To this end, the Sp I-Fv system comprising the genes cas3, cas5, cas7, and cas6f was placed under control of the arabinose-inducible P_ara_ promoter on a broad-host range plasmid, and the corresponding guide RNA was separately expressed from a second plasmid also controlled by the P_BAD_ promoter. For the type II Cas9 system, we adopted a system established for Bacillus, which is based on a single plasmid expressing Cas9 along with the tracr and guide RNAs required for activity (22).
Suitable regions for CRISPR-Cascade interference in phage genomes are defined by the so-called protospacer-adjacent motif, which for both used systems is a “GG” at the 3′-end of the used protospacer. For the Sp I-Fv system, seven protospacer regions within the Thanatos chromosome were used (see Fig. 1). Two are located within the genes TH1_010, encoding the major capsid protein and the gene TH1_020, encoding a tail lysozyme; the others targeted gene regions of unknown functions. The regions were chosen to represent high to low G/C contents. As a positive control, we chose two protospacer regions in the genome of S. oneidenis phage Lambda (LambdaSo) within the genes SO_2963 (encoding the major capsid protein) and the gene SO_2975 (encoding a protein with unknown function). We previously showed that LambdaSo, which harbors unmodified DNA, is a temperate phage that results in effective lysis in S. oneidensis MR-1 strains lacking the prophage and the phage’s integration site (23, 24). The vectors expressing the Cas genes and the gRNA, respectively, were co-transformed into a S. oneidensis host strain susceptible to both LambdaSo and Thanatos (S. oneidensis ΔLambdaSo ΔMuSo2), which contained no active prophages that may interfere with scoring phage activity by plaque formation. This strain was generally used for all infection experiments. Expression of the respective CRISPR-Cas system was then induced by the addition of arabinose. Phage infection efficiency was then determined by soft agar overlay assays applying sequential dilutions of phage particles (Fig. 1). We found that LambdaSo infection and plaque formation were completely abolished in the presence of the CASCADE complex and either of the two LambdaSo protospacers, but not when Thanatos protospacers were used (Fig. S3). In contrast, no CRISPR interference was observed when the cells were infected with phage Thanatos, regardless of which protospacer was added, indicating that the I-Fv Cascade system has none or only low efficiency in cutting Thanatos DNA.
Shewanella phage Thanatos exhibits resistance against CRISPR-Cas systems. Shown are spot assays of S. oneidensis expressing the CRISPR-Cas I-fv from S. putrefaciens CN-32 (Cas3; left panels) or Cas9 from Streptococcus pyogenes (Cas9; right panels). In the upper two panels, the cells were exposed to increasing dilutions (steps of 1:10) of Shewanella phage LambdaSo, and in the lower panels to Shewanella phage Thanatos, accordingly. In each experiment, the cells in the upper panel expressed the corresponding CRISPR-Cas system without any guiding RNA (w/o; negative control). In the lower panel(s), a guide RNA targeting the indicated gene was co-produced. The results show that Cas3 from the S. putrefaciens IF-v system, as well as Cas9, efficiently decreases lysis by LambdaSo (as a positive control), while no effect occurs on Shewanella phage Thanatos. Shown are representative assays of three independent experiments.
Also, the Cas9-based system in S. oneidensis showed a significant effect on infection by phage Lambda (Fig. 1). While not as effective as the I-Fv system, the occurrence of LambdaSo-derived phage plaques decreased by three to four orders of magnitude, indicating that the Cas9 system is generally active in S. oneidensis. However, no effect was observed upon infection by Thanatos, as the number of CFU in assays with Cas9 was indistinguishable from those of the negative controls without guide RNA. Thus, as Cas3 of the I-Fv system, Cas9 is significantly less effective against Thanatos. Taken together, the results show that different bacterial DNA restriction systems do not effectively inhibit Thanatos infection, suggesting that the phage’s DNA is modified in a way that prevents nuclease function.
Characterization of an adenosine 6mA methyltransferase specifically active on NA*TC sequence motifs
During previous genome annotation (15), we discovered a gene (TH1_126) coding for a putative DNA adenine methyltransferase of 273 amino acids. Protein sequence similarity (PSI-BLAST) analysis showed that the most closely related proteins are putative methyltransferase genes from metagenome-assembled genomes, classified as Siphoviridae sp. viruses with a maximum identity of 39.05% (95% query coverage). We aimed to produce this putative phage methyltransferase protein in the methylation-free K-12-derived E. coli strain ER3413 (25) to determine a potential methyltransferase activity in vivo. TH1_126 was cloned into the arabinose-inducible pBAD vector system (26). Empty vector ER3413 and ER3413 harboring pBAD::TH1_126 were cultivated on LB agar plates containing 0.1% (wt/vol) arabinose. After overnight cultivation and DNA extraction, nanopore sequencing of native DNA was performed. Data analysis was carried out according to a workflow based on tombo (27) as described in the Materials and Methods. Motif detection from 100 sequences (length: 10 bp), which were found to be most strongly modified, yielded a conserved 5′-ATC-3′ motif in 86 sequences (Fig. 2). Since MEME was configured to search for a conserved motif on both strands of a given sequence, it reverse-complemented 41 of 86 sequences, which in turn means that 5′-GAT-3′ is also enriched. The remaining 14 sequences did not match, which may be due to the unspecific activity of the methylase upon overproduction.
De novo detection of modified sequence motifs from nanopore raw data generated by overexpression of TH1_126 in E. coli ER3413. The tool MEME (21) was used to derive a motif from the 100 most strongly modified sequences of length 10. Eighty-six sequences contain the motif “ATC” or the reverse complement “GAT.”
In the following, we focused on the 5′-ATC-3′ orientation of the detected motif. The control containing only the empty vector plasmid did not yield any conserved motif from the respective sequencing reads in comparison (not shown). We explored up- and downstream extensions of the ATC motif to check for the influence of adjacent bases on the methylation of ATC and found that there likely is no preference for any base upstream of the 5′-ATC-3′ motif (for details, see Fig. S4).
The derived methylation site 6mATC overlaps with the recognition site profile of DpnI, which is known to cleave fully methylated GATC sites and to a certain extent also CATC and GATG motifs (28). We did not find any restriction enzyme in REBASE (29) that is known to specifically recognize and methylate a TATC or AATC motif. Therefore, to assess the validity of the bioinformatically derived methylation site, we employed a genomic DNA digestion assay with DpnI, which provided additional proof for the methylation of CATC and GATC motifs (Fig. S5).
We further conducted a PSI-BLAST search against the S. oneidensis MR-1 genome sequence with TH1_126 as the input sequence to find possible homologous sequences. We identified SO_0289, which is annotated as Dam family site-specific DNA-(adenine-N6)-methyltransferase, to be a distant homolog of TH1_126 (query coverage 72%, % identity 24.4%), which may point to a common evolutionary origin of both phage and host methyltransferases.
Nanopore sequencing of native Thanatos genomic DNA yields low-quality reads
Since nanopore sequencing has been successfully used for the characterization of DNA modification sites (4, 24), we aimed to employ this technique not for re-sequencing but to elucidate possible DNA modifications of Shewanella phage Thanatos. To generate additional data of an unmodified reference, we used the RPB-004 sequencing kit, which includes PCR amplification prior to sequencing after random transposase fragmentation and adaptation.
Mean quality scores per read clearly differed between native DNA sequencing and PCR-based sequencing (25), with medians around 7.5 and 10–12.5, respectively. Converted to percent wrongly called bases, this corresponds to about 20% in the native DNA library and as little as 5% in the PCR-based library.
The guppy basecaller provides three basecalling models, employing different algorithms, namely the fast, high-accuracy, and super-accurate models, which we all used for basecalling. The models differ in achieved raw read accuracy and computation time. Independent of the employed basecalling model, native DNA and PCR-amplified DNA yield sequencing reads of varying quality (Fig. 3). Due to the surprisingly low quality of sequenced native DNA and the big difference to PCR-amplified DNA, we figured that modified DNA may play a role in hindering high-quality nanopore sequencing. We therefore aimed to investigate the genetic repertoire of Thanatos, mainly searching for gene products potentially involved in DNA modification.
Quality score (qscore) distribution of nanopore sequencing reads as determined by guppy basecaller v5.0.11 using different basecalling modes. Boxplots of average per-read quality scores of sequenced native and PCR-amplified DNAs are shown.
Novel cytosine modification is uncovered by mass spectrometric analyses
Cytosine- or adenosine-specific DNA methylation is not known to impair nanopore sequencing to the extent observed during sequencing of native Thanatos DNA (Fig. 4). Therefore, we hypothesized that there is an additional DNA modification, which would also be consistent with the resistance of Thanatos DNA to restriction endonucleases and reduced Cas cleavage.
MS analysis of Thanatos DNA. (A) Chromatogram of liquid chromatography analysis with single nucleotides from Thanatos genomic DNA. (B) Mass spectra after isolation and tandem MS fragmentation of m/z 360.10 measured in positive ionization mode after dephosphorylation of nucleotides to nucleosides using alkaline phosphatase.
To characterize the nature of potential DNA modifications, we employed mass spectrometry. After phage cultivation, we isolated and concentrated the phage fraction using anion-exchange chromatography. After digestion to single nucleosides, we applied liquid chromatography and mass spectrometry. The chromatogram peaks corresponding to dTMP (321.06 m/z) and dAMP (330.07 m/z) were comparable in peak area, while the dCMP peak was considerably smaller than the dGMP (346.08 m/z) peak. Additionally, a double peak with an m/z of 438.094 was apparent. This unknown peak started to elute after 19.5 min and lasted until 21 min (Fig. 4). Its larger mass-to-charge ratio suggested a deoxypentose sugar bound to the cytosine base via an O-glycosidic bond. For further elucidation, we dephosphorylated the nucleotide sample and used tandem MS to fragment the unknown compound in positive ionization mode. After fragmentation, peaks were detected at 128.0450 m/z and 244.0901 m/z. The first peak (128.0450 m/z) most likely corresponds to cytosine with a positively charged oxygen in the C-5 position, which possibly stems from a broken O-glycosidic link to a pentose. The second peak (244.0901 m/z) most likely represents the cytosine still bound to the deoxypentose and only devoid of the deoxyribose from the DNA backbone.
To characterize the pentose configuration of Shewanella phage Thanatos, we hydrolyzed the phage nucleosides with trifluoroacetic acid and conducted GC-MS analysis after MSTFA derivatization. As reference standards, we also injected derivatized samples of 2-deoxy xylose and 2-deoxy ribose. We isolated the chromatographic signal of the characteristic pentose mass-to-charge ratio in the range from 306.50 to 307.50 m/z. The chromatogram of the hydrolyzed phage DNA showed a distinct signal containing the above-mentioned m/z (Fig. 4). However, the retention time did not match the respective retention times of any of the reference standards. These results lead us to the conclusion that we discovered a novel phage DNA modification, consisting of a deoxypentose bound to the cytosine, which we could not describe further. The amount of modified cytosine appears to be greater than 50%, and a full modification cannot be ruled out, as the preparation of phage DNA may still contain traces of host cytosines. We concluded that Shewanella phage Thanatos most probably escapes restriction cleavage and Cas attack by attaching the deoxypentose to its cytosine residues.
Bioinformatic identification of Thanatos candidate genes for cytosine modification
We further aimed to investigate the occurrence of modified nucleotides in Thanatos by searching the genome for putative modification-related gene products. As glycosylation of cytosine residues plays an important role in DNA modification, especially in T-even bacteriophages like T4, we focused on gene products possibly involved in this process. Generally, the generation of glycosylated DNA requires activated sugars and a glycosyltransferase enzyme (2). We conducted an analysis of the ORF annotations for cytosine modification-related enzymes and identified three gene products putatively involved in modifying cytosine derivatives, TH1_075, TH1_062, and TH1_119 (Table 1).
TH1_075 is predicted to encode a dCTP pyrophosphatase, which dephosphorylates dCTP to dCMP. It thus precedes the dCMP deaminase (TH1_119), which catalyzes the formation of dUMP by hydrolytic deamination of dCMP, and, by that, supplies the nucleotide substrate for a thymidylate synthase. TH1_062 is a putative deoxycytidylate hydroxymethyltransferase, which can also be annotated as thymidylate synthase. It may therefore be connected to TH1_119 and TH1_075 by methylating dUMP to generate dTMP and, in consequence, depleting the host’s dCTP pool.
As the semi-automated genome annotation gave no indication of a glycosyltransferase, we conducted a search for conserved protein domains using CDSEARCH (27) in all annotated protein sequences. By this, we identified hypothetical protein TH1_060 (QJT71744.1; 561 amino acids), which has a predicted glycosyltransferase domain at its N-terminus and a phosphotransferase domain at the C-terminus and may also play a role in DNA modification (Table 1). We further employed structure-based homology search by subjecting the neighboring ORFs of TH1_060, TH1_057 through TH1_065 to ColabFold (30–32), figuring that functionally associated genes likely occur in the vicinity of each other. Resulting predicted structures were input into DALI (33) to identify structural homologs in the PDB (34, 35). By this approach, TH1_063 (QJT71747.1; 300 amino acids) was identified as an additional glycosyltransferase candidate enzyme due to its structural homology to known glycosyltransferase enzyme folds (see Fig. S6; Table S1).
TH1-063 is a UDP-xylose pyrophosphorylase, possibly also responsible for the generation of activated deoxypentose for cytosine modification
We expressed the two glycosyltransferase structural homologs TH1_060 and TH1_063 with N-terminal 6x His tags in E. coli using the arabinose-inducible pBAD expression system (26). After expression and purification, we initially included both proteins in an enzyme assay aiming to assess a putative NTP pentose pyrophosphorylase function, which could provide the activated sugar precursors for cytosine modification. We chose xylose-1-phosphate as a representative phosphorylated pentose precursor, as a phosphorylated deoxypentose was not commercially available to the best of our knowledge. Using LC-MS, we analyzed the reaction mixture consisting of xylose-1-phosphate, one of the five NTPs (ATP, CTP, GTP, UTP, dTTP), pyrophosphatase, and either of the phage enzymes TH1_060 and TH1_063. The reaction was incubated at 30°C for 3 h. In the reaction containing UTP, xylose-1-phosphate, and TH1_063, in addition to the chromatogram peaks of the reaction substrates, a peak eluting after 18 min with the prevalent m/z 535.0360 was detected and fragmented using tandem MS in negative ionization mode. Its fragment masses clearly mirror the fragment pattern from tandem MS of UDP-xylose, which was used as a standard (Fig. 5). These results show that the TH1_063 protein has a UDP-xylose pyrophosphorylase function, thereby possibly also providing the precursor for the attachment of a deoxypentose to cytosine in a parallel fashion.
Enzymatic activity of TH1_063. (A) Extracted ion chromatogram (negative ionization mode) of m/z 535.0360, which corresponds to UDP-xylose. Upper frame shows no enzyme control, while the assay containing the purified TH1_063 enzyme is shown in the bottom frame. (B) Tandem MS of m/z 535.00 comparing TH1_063 assay and UDP-xylose standard. (C) Reaction schematics of likely UDP-xylose pyrophosphorylase enzyme TH1_063.
Next, we employed both TH1_060 and TH1_063 in an assay containing UDP-xylose and plasmid DNA to examine whether either of the enzymes is able to transfer activated xylose to cytosine. After incubation for 3 h at 30°C, plasmid DNA was purified using magnetic beads and digested into single nucleosides at 37°C overnight. The nucleosides were separated and analyzed using LC-MS in positive ionization mode, and peaks corresponding to the canonical nucleosides were readily identified. However, we did not detect xylosylated deoxycytosine in our reaction mixture. We further repeated the assays using dCMP, dCTP, and cytosine as acceptor substrates but ended up with the same negative results (not shown).
Inactivation of TH1_060 drastically affects phage propagation
To determine if the putative glycosyl transferase TH1_060 has a role in DNA modification, we aimed at directly inactivating the enzyme in phage Thanatos. To this end, the first two bases were deleted from the ATG start codon of the gene, resulting in an immediate frame-shift mutation within the open reading frame and loss of the gene product (Thanatos ΔTH1_060). Of note, the mutation could only be obtained when the gene product TH1_060 was provided by the host cells by ectopic production from a plasmid. Accordingly, the modified phage Thanatos ΔTH1_060 was only able to proliferate in the presence of the gene product but was unable to proliferate in a host in the absence of TH1_060 (Fig. 6). The amount of active Thanatos TH1_060 phages was too small (<10 pfu/mL) for DNA isolation and direct modification analysis. From this, we conclude that the activity of TH1_060 DNA is required for survival and/or propagation in its host S. oneidensis, which is also in line with a potential role in phage DNA modification.
Thanatos gene TH1_060 is essential for a successful infection of S. oneidensis MR-1. Shown are spot assays of S. oneidensis MR-1 ∆LambdaSo ∆MuSo2 (MR-1) with (+) or without (−) the complementation plasmid pBBR1MCS-5_TH1_60 (pBBR_TH1_060). The cultures were plated on LB-plates, and a serial dilution of phage lysate of the Thanatos wild type (Wt) or the TH1_060 frameshift mutant (∆TH1_060) was spotted on top of it. 100 was according to 1 × 108 PFU/mL. Shown are representative assays of three independent experiments.
DISCUSSION
Here, we report the epigenetic characterization of Shewanella phage Thanatos, including a novel phage adenine methylase and the discovery of a phage DNA modification, which we determined to be a deoxypentose added to deoxycytosine. Previous studies showed that nucleotide modifications at the cytosine can effectively decrease the efficiency of DNA-targeting systems, such as CRISPR-Cas, and thereby protect the phage DNA within the critical time period after injection into the host cell (9, 14, 36–38). Our results strongly indicate that this is similarly true for the deoxypentose modification of deoxycytosine of Shewanella phage Thanatos, at least against the tested restriction by R-M systems and interference by Cas9 and an I-Fv system. Of note, S. oneidensis has 21 putative phage defense systems (according to the PADLOC database; https://padloc.otago.ac.nz/padloc/ [39]). Some of them are predicted to target DNA (i.e., restriction-modification systems of type I, II, and IV; MADS). If the Thanatos DNA modification provides any protection against these systems, it will be the subject of further studies.
When using nanopore single-molecule sequencing for modification detection, we noticed that high-quality basecalling of sequencing data from Thanatos was only possible when PCR-amplifying the genomic DNA prior to sequencing. Read sequences determined from native DNA were of poor quality and impossible to align to the Thanatos genome sequence, which made modification detection from nanopore raw data unachievable. Interestingly, signal-level data yielded the impression of long reads, while the guppy basecaller only provided short basecalled sequences. This means that pentosyl-deoxycytosine residues pass through the nanopore during sequencing and induce measurable voltage changes, but basecalling models are not equipped to call the respective base sequence. Strongly reduced read quality to the extent of complete inability to map the sequencing reads has also been described with nanopore sequencing of phage RB69 DNA, which contains arabinosyl-hmC (13). This challenge remains open for research efforts since the training of basecalling models for bases that strongly differ from the four consensus DNA bases is not easily possible to date, because nearly all tools for modified base detection rely on reference alignment after basecalling (5).
Initially, we had identified the protein TH1_126, which was predicted to be an adenine methylase, and characterized the enzyme by production in the methylase-free E. coli strain ER3413 and subsequent nanopore sequencing of the strain’s genomic DNA. Methylation calling and motif analysis yielded a 5′-ATC-3′ recognition site, which was confirmed using a DpnI restriction assay. Since DpnI only recognizes GATC and CATC motifs, we would have preferred to have further proof of AATC and TATC (or even directly for ATC) methylation. However, there were no suitable restriction enzymes that we could use for the respective assay. Shewanella oneidensis MR-1 possesses two regularly methylated palindromic motifs (GATC and ATCGAT) (40), which both contain the sequence motif targeted by the phage methylase. This could point to a protective effect of methylation from host restriction and to a common evolutionary origin of host and phage methyltransferases, which is also supported by the homology of TH1_126 and host methyltransferase SO_0289, which targets GATC motifs (40). However, since S. oneidensis MR-1 does not contain any known restriction nuclease cleaving unmethylated GATC motifs (40), methylation by TH1_126 may rather protect the Thanatos genome from cleavage when infection takes place in another host organism. As DNA methylation in S. oneidensis MR-1 also plays a role in processes like DNA mismatch repair and regulation of genome replication (40), methylating its genomic DNA may also help Thanatos in synchronizing itself with the host. LC-MS analyses did show detectable amounts of mA in digested Thanatos DNA, implying that phage methyltransferase TH1_126 was active, at least to a certain extent. It may as well serve another function during phage infection, e.g., methylation of host DNA. It is known from the T4 phage that mutants devoid of glucosyl-hmC have a higher methylation frequency of 5′-GATC-3′ motifs, suggesting an inhibitive effect of glucosyl-hmC on adenine methylase (9), which may also be the case with the Thanatos deoxypentose cytosine modification.
Because methylated DNA bases are generally not known to dramatically impair nanopore sequencing, we searched for additional DNA modifications. We found that the Thanatos genome encodes putative dCMP deaminase, dCTP pyrophosphatase, and deoxycytidylate hmC transferase/thymidylate synthase enzymes, which are likely involved in nucleotide metabolism or modification. The presence of these three enzymes pointed to a potential modification at the cytosine nucleotides. Accordingly, by DNA digestion and single-nucleoside analysis using mass spectrometry techniques, we confirmed such a cytosine modification, which we determined to be a deoxypentose moiety. Our data from LC-MS clearly shows that the deoxypentose is bound directly to the cytosine base via an O-glycosidic bond. This contrasts the findings from T-even bacteriophages T2, T4, and T6, where glucose is bound via an additional hydroxymethyl group to the nucleobase (9, 41).
While three phage-encoded proteins, TH1_062, TH1_075, and TH1_119, are good candidates to be involved in the synthesis of such a modification, candidate enzymes for the generation of activated sugar and a sugar transferase were lacking. Using CDSEARCH, we additionally identified hypothetical protein TH1_060, with both predicted glycosyltransferase and phosphotransferase domains. BLASTP analysis returned 106 putative sequence homologs (cut-offs: 70% query coverage, 40% identity) from different T4-like phage genomes, with the best hit being a hypothetical protein from Yersinia phage vB_YenM_TG1 at 55.81% identity (YP_009200315.1). The corresponding gene products were classified as hypothetical proteins, “bifunctional protein,” or “NTP-transferase domain-containing protein.” None of these proteins has been characterized experimentally. However, the related Escherichia phage RB69 possesses coding region ORF052_053c, which is a homolog of TH1_060, with an amino acid identity of 53.2% with 100% query coverage between ORF053_052c and TH1_060. In Escherichia phage RB69, ORF052_053c is presumed to play a role in the generation of UDP-arabinose, which is essential for the formation of arabinosylated cytosine (13), which also hints at a possible role of TH1_060 in the generation of activated xylose necessary for subsequent sugar transfer to the cytosine.
The AlphaFold software (31, 32) was used to predict protein structures of proteins TH1_057 through TH1_065. Both TH1_060 and TH1_063 structurally resembled known nucleotidyltransferase and glycosyltransferase proteins and were therefore expressed in E. coli and characterized using in vitro enzyme assays. We could show that TH1_063 serves as a UDP-xylose pyrophosphorylase, by structural homology based on AlphaFold predictions. After proving the sugar-activating function of TH1_063, one can also speculate that it would accept a deoxypentose like deoxyribose-1-phosphate as a substrate. This putative activity will be the subject of further studies. We could not identify the function of TH1_060 in vitro, which may be a bifunctional enzyme possibly involved in transferring the deoxypentose to the cytosine. Another issue is posed by the hydroxyl group we found at the cytosine after tandem MS fragmentation, as we currently have no valid hypothesis on how the hydroxyl group is attached to the modified cytosine. Of note, we could show by direct inactivation of TH1_060 on the phage chromosome that the encoded protein is essential for normal phage propagation. Previous studies on the T4 phage showed that establishing a T4 mutant with non-modified cytosines required not only the inactivation of phage genes involved in the modification itself. In addition, nucleases targeting non-modified DNA and a protein blocking transcription of DNA with canonical cytosines had to be removed or mutated (42, 43). This might explain why Thanatos deleted in TH1_060 is unable to proliferate, which made it impossible to harvest sufficient DNA for direct analysis. A second possibility is that the host’s phage defense systems are efficient against Thanatos, lacking the DNA modification. Taken together, TH1_060 remains a good candidate as a glycosyltransferase in Shewanella phage Thanatos cytosine modification. Further studies are required to fully identify the protein machinery, which synthesizes the modified cytosine and its precursors and may also include host enzymes, as demonstrated for the Escherichia T4 phage (36, 42, 44).
In summary, our study expands the range of general and phage-mediated DNA modifications, demonstrating that the full spectrum of DNA modification-based phage-defense systems remains elusive. While preparing this manuscript, a new study became available, which showed that O-linked deoxypentose modifications are more prevalent in phages and provide protection against various phage-defense systems targeting phage DNA (14). We expect that further modifications will be found in the future, as well as bacterial counter-defense systems that will circumvent this phage mode of protection.
MATERIALS AND METHODS
Strain cultivation
Bacterial strains used in this study are summarized in Table S2. Shewanella oneidensis MR-1 strains were generally cultivated in LB medium at 30°C or room temperature. E. coli strains were grown in LB at 37°C. For solidification, 1.5% (wt/vol) agar was added to the medium. When appropriate, media were supplemented with 50 µg mL^−1^ kanamycin and/or 10 µg mL^−1^ chloramphenicol. Gene expression from plasmids was induced by the addition of 0.2% (wt/vol) L-arabinose from a 20% (wt/vol) stock solution. For the conjugation strain E. coli WM3064, media were supplemented with 300 µM of 2,6-diaminopimelic acid.
Strain constructions
To determine whether Thanatos-1 is susceptible to CRISPR-Cas-mediated cleavage, we used two different systems, the CRISPR-Cas system IFv from S. putrefaciens CN-32 (20) and a system based on Cas9 (22). Corresponding primers and oligonucleotides are listed in Tables S3 and S4. To generate an inducible system for the CRISPR IFv system, the corresponding DNA fragment was amplified by PCR from pCas1 (45) with the addition of a C-terminal FLAG tag-encoding sequence to cas6f, the most downstream gene of the cluster, to serve as an expression and production control. The cascade cassette was recombined behind the araC-P_BAD_ cassette, which was amplified from pBAD33, by Gibson assembly (46) into EcoRV linearized broad-host range vector pBBR1-MCS5, yielding pCASCADE_RBS_wtCas3. The guide RNA was provided from a second, independent vector. To this end, the appropriate sequence was cloned behind the arabinose-inducible promoter in the vector pBAD33. Both plasmids were verified by sequencing and co-transformed into S. oneidensis by electroporation. To generate a Cas9-based system, an appropriate DNA fragment encoding the guiding sequence was cloned into BsaI-linearized vector pTS021 (22) by Gibson assembly. Correct insertion was confirmed by sequencing, and the vector was transformed into S. oneidensis MR-1 via electroporation. The oligonucleotides to amplify appropriate spacers from the phage genomes were designed using the CutSPR software (47).
Phage isolation
To obtain Thanatos-1 and LambdaSo phage particles, 10 mL LB culture of S. oneidensis MR-1 ΔLambdaSo ΔMuSo2 was inoculated with 250 µL of an overnight culture and incubated shaking at room temperature until an OD_600_ of 0.4 was reached. The Thanatos mutant containing the frameshift of TH1_60 was enriched by using an S. oneidensis MR-1 ∆LambdaSo ∆MuSo2 strain possessing the complementation vector. After incubation, 100 µL of phage stock was added, and the cultures were further incubated for 24 h. After that, residual cells and cell debris were removed by centrifugation (15,000 × g, 5 min, RT). The supernatant was filtered (0.45 µm), and 1.5-mL aliquots were stored at 4°C. The concentration of plaque-forming units was calculated as described below.
Phage infection assays
To determine the number of infectious, plaque-forming phage particles, 20 mL LB culture of S. oneidensis MR-1 ΔLambdaSo ΔMuSo2 (23) was inoculated with 250 µL of an overnight culture and incubated shaking at room temperature until an OD_600_ of 0.4 was reached. If necessary, 0.2% arabinose was added to induce production of the CRISPR modules in the host strain. Then 15 mL of culture was harvested by centrifugation (10 min, 4,000 × g, 4°C), resuspended in 6 mL LB medium, and kept on ice. A serial dilution (10^−1^–10^−9^) of the phage suspension was prepared in LB, and 100 µL of each dilution was pipetted into a reaction vial. Next, 100 µL of the bacterial suspension was added, well mixed with the phage dilution, and incubated for 20 min at 30°C to allow infection of the cells. A control vial did not contain any phages. After incubation, the phage-host culture was mixed with 5 mL of 0.5% (wt/vol) soft agar (in LB) and used to overlay a regular LB agar plate. If induction of the plasmid-borne CRISPR system was required, the plates contained kanamycin and/or chloramphenicol and 0.2% (wt/vol) arabinose. Plates were incubated at 30°C overnight. Plaques formed by the phage particles were counted, and the amount of PFU per mL preparation was calculated.
As a second assay to determine the number of PFU, spot assays were carried out. To this end, 400 µL of an S. oneidensis MR-1 ΔLambdaSo ΔMuSo2 culture was mixed with 7 mL 0.5% (wt/vol) LB top agar and spread on regular LB plates (containing kanamycin and arabinose if necessary). After solidification of the top agar, 1.5 µL of a serial phage dilution was carefully spotted onto the surface. Plates were incubated at 30°C overnight and scored for plaque formation. To determine the amount and isolate recombinant phages, a plaque assay was performed. Therefore, 100 µL of a stationary culture was mixed with 100 µL diluted phage-containing supernatant in 7 mL 0.5% (wt/vol) soft agar and poured on an agar plate. After an overnight incubation at room temperature, the amount of PFU/mL could be calculated, or the plaques could be isolated.
Genetic inactivation of TH1_060
A method based on Shitrit et al. (48) was used for gene engineering of Thanatos. To this end, a cloning vector pBAD33 containing 500 bp homologous overhangs up- and downstream and a 69 bp Tag was generated by Gibson assembly. In one of the overhangs, the A and T of the TH1_060 start codon were missing, which led to the frameshift. For phage engineering, a mid-exponential culture of S. oneidensis MR-1 ∆LambdaSo ∆MuSo2 containing the engineering vector and the complementation plasmid expressing TH1_60 was infected with Thanatos at an MOI of 0.01 and incubated overnight at room temperature. The supernatant of this culture was filtered (0.2 µm) 24 h post-infection. To verify the presence of recombinant phages, a Phusion PCR was performed. For this, the primers annealed within the integrated Tag and outside the 500 bp homologs overhang region. Afterward, the phages were enriched by dividing mid-exponential S. oneidensis MR-1 ∆LambdaSo ∆MuSo2 cells containing the complementation vector into several 96-well plates (200 µL per well) and adding 100 phages per well. Twenty-four hours after infection, each well was screened for recombinant phages by Phusion PCR. The supernatant of the well containing recombinant phages was collected, and the previous enrichment step was repeated. However, this time, 10 phages per well were used to infect the cells. Again, the supernatant of the wells was filtered, which contained recombinant phages according to PCR. These supernatants were used to perform a plaque assay. Thereafter, plaques were isolated and incubated overnight with S. oneidensis MR-1 ∆LambdaSo ∆MuSo2 cells containing the complementation vector. These cultures were screened by PCR for single-plaque-purified recombinant phages. The single-plaque purification was repeated one more time. At the end, positive PCR products were checked again by sequencing to ensure the presence of the frameshift. The supernatant containing the purified recombinant phages was stored at 4°C.
Methylation analysis
For heterologous expression of TH1_126, methylase-free E. coli strain ER3413 was obtained from Yale Coli Genetic Stock Center (CGSC#: 14167, New Haven, USA). TH1_126 was amplified from Thanatos-1 DNA by PCR and cloned into the pBAD24 vector at the NcoI restriction site using Gibson assembly (46). The Gibson assembly reaction was transformed to E. coli DH5α, and successful vector construction was verified using Sanger sequencing. Overexpression using ER3413 was induced via overnight cultivation on LB-agar containing 0.1% arabinose. Subsequently, genomic DNA was extracted using Macherey & Nagel Microbial DNA kit (Düren, Germany). Nanopore sequencing was performed on the GridION Mk1 using R9.4.1 flow cells and the Rapid Barcoding kit (RBK-004). Basecalling was carried out using guppy v5.0.11 (available to ONT customers via their community site: https://community.nanoporetech.com/), while tombo v1.5 (49) was used for methylation analysis (https://github.com/nanoporetech/tombo). In a process called “resquiggling,” tombo aligns the raw current signals to the reference sequence. In a second step, tombo’s 6 mA alternative model method was applied to predict the fraction of modified bases for each adenosine in the genomic reference sequence. By comparing the results of an unmodified negative control (in this case, ER3413+pBAD24) to those of the modified data set, false predictions due to model inaccuracies can be identified. As the exact positioning of the modified base in an interval of roughly five bases cannot be determined due to the inherent nature of nanopore data, the 100 most strongly modified regions of 10 bp were extracted and used as input for the MEME motif suite (50). MEME was used with standard settings, including “revcomp” mode, which allows a motif to be detected on both strands. MEME output was visualized using WebLogo v3.7.4 (51). The detected, distinct sequence motif (5mATC) was then analyzed by extracting all genomic loci containing NNATCNN and calculating the mean fraction of modified bases across all occurrences of the specific motif, based on the predictions generated by tombo. Additionally, the motif was extended up- and downstream one position at a time, and again, the mean fraction of modified bases for the resulting motifs across all respective genomic loci was calculated and visualized.
Phage purification and DNA extraction
DNA for restriction analysis was extracted as follows: 10 mL of Thanatos lysate was combined with 5 mL of a PEG solution (10% PEG 8000, 1 M NaCl) and mixed by inversion. After overnight incubation at 4°C, the lysate was centrifuged at 9,000 rpm for 45 min. The pellet was resuspended in 500 µL MgSO_4_ and mixed with 1.25 µL DNaseI (1.25 µg) and RNaseA (20 mg). The mixture was then incubated for 1 h at 37°C. In the next step, 1.25 µL Proteinase K (20 µg), 25 µL 10% SDS, and 20 µL 0.5 M EDTA, pH 8, were added, mixed, and incubated for 1 h at 60°C. After cooling to room temperature, the same volume of phenol:chloroform:isoamyl alcohol (25:24:1) was added, inverted, and centrifuged for 5 min at 6,000 rpm. The supernatant was transferred to a new tube and the purification step was repeated. For the next purification step, an equal volume of chloroform was added to the supernatant and centrifuged. Finally, 1/10 of the volume of 3 M NaOAc pH 7.5 and 2.5× of the volume of ice-cold, 100% ethanol were added to the supernatant before incubating the mixture overnight at −20°C. The tube was centrifuged at maximum speed for 20 min, the supernatant was removed, and the tube was filled halfway with 70% ethanol. Following centrifugation for 2 min, the washing step was repeated, and the ethanol was removed. After the ethanol had completely evaporated, the pellet was dissolved in water, and the concentration of phage DNA was measured using a spectrophotometer.
The DNA of S. oneidensis MR-1 was isolated using the Omega BIO-TEK Bacterial DNA Kit. The concentration was measured using a spectrophotometer. For chromatographic purification of Thanatos phage DNA, the lysed culture was centrifuged (11,000 × g, 10 min, 4°C) and the supernatant was filtered through a 0.2 μm sterile filter. A 1-mL CIMmultus OH-1 Advanced Composite Column (pores 6 μM) was used for chromatography on an ÄKTAprime plus system. Phage lysate was diluted 1:1 with 3 M K_2_HPO_4_, KH_2_PO_4_ buffer (pH 7.0), and loaded on the OH column (flow rate 5 mL/min). Buffer A (1.5 M K_2_HPO_4_, KH_2_PO_4_; pH 7.0) was used for washing and for elution, a linear gradient from 0% to 100% of Buffer B (20 mM K_2_HPO_4_, KH_2_PO_4_; pH 7.0) was applied. The phage eluate was dialyzed against 10 mM K_2_HPO_4_, KH_2_PO_4_ pH 7.0 buffer using Slide-A-Lyzer Dialysis cassettes (ThermoFisher Scientific, Waltham, USA) for 48 h at 4°C. Afterward, phage DNA was extracted using the Phage DNA Isolation kit (Norgen Biotek, Thorold, Canada).
Endonuclease detection
The genomic DNA from the bacteria or phage (ca. 300 ng/µL) was mixed with ApaI, XbaI, Eco47I, SmaI, KpnI, XhoI, EcoRV, or SspI and incubated for 3 h at 37°C. Subsequently, DNA loading dye was added to the digested DNA and loaded onto a 0.7% agarose gel for fragment size separation.
Exonuclease digestion
Genomic phage DNA was digested using either a combined treatment of DNaseI and ExoI to create single nucleotides or a commercially available nucleoside digestion mix (NEB, Ipswich, USA).
Liquid chromatography-mass spectrometry
For LC-MS, the Ultimate 3000 (Thermo Fisher Scientific, Germering, Germany) UPLC system coupled to a microTOF-Q hybrid quadrupole/time-of-flight mass spectrometer (Bruker Daltonics, Bremen, Germany) was used, equipped with an electrospray ionization source. For liquid chromatography, a SeQuant ZIC-pHILIC 5 µm Polymeric column 150 × 2.1 mm (Merck Millipore, Darmstadt, Germany) was used. For chromatographic separation, 2 µL of the sample was injected and eluent A (20 mM NH_4_HCO_3_, pH 9.3, adjusted with aqueous ammonia solution) and eluent B (acetonitrile) were applied at a flow rate of 0.2 mL min^−1^ by use of the following gradient: 0 min B: 90%, 30 min B: 50%, 37.5 min B: 50%, 40.0 min B: 90%, 60 min B: 90%. Mass spectrometry was performed in negative ionization mode. The temperature of the dry gas and the capillary was set to 180°C. The scan range of the MS was set to 50–1,000 m/z. For interpretation of the mass spectrometry data, the software Data Analysis 4.0 (Bruker Daltonics, Bremen, Germany) was used.
After dephosphorylation of nucleotides to nucleosides using shrimp alkaline phosphatase (ThermoFisher Scientific, Waltham, USA), LC-MS/MS measurements were performed. In this case, we applied liquid chromatography by using a Bluespher C-18 column 100 × 2 mm (Knauer, Berlin, Germany). The injection volume was 5 µL and the flow rate was set to 0.4 mL/min. For chromatographic separation, the solutions eluent A (water + 0.1% formic acid) and eluent B (acetonitrile + 0.1% formic acid) were used in the following gradient: 0 min B: 5%, 15 min B: 5%, 20 min B: 80%, 33 min B: 5%. Mass spectrometry was performed in positive ionization mode. The temperature of the dry gas and the capillary was set to 180°C. The scan range of the MS was set to 50–1,000 m/z. MS/MS measurements were performed in MRM mode by first isolating the single mass 360.10 m/z and then applying various collision energies from 10 to 30 eV. For interpretation of the mass spectrometry data, the software Data Analysis 4.0 (Bruker Daltonics, Bremen, Germany) was used.
Heterologous expression and purification of His-tagged Thanatos-1 proteins in E. coli
Phage proteins TH1-060 and TH1-063 were amplified from the Thanatos genome and were cloned into the NcoI site of the pBAD24 expression vector. Subsequently, N-terminal 6x His-tags were added via primer overhangs. Overnight cultures of E. coli BL21(DE3)Gold (Agilent, Santa Clara, USA) were inoculated at an OD_600_ of 0.1 into 50 mL LB-Amp in 200 mL shaking flasks at 37°C with 180 rpm and protein expression was induced at OD_600_ 0.5 with a final concentration of 0.3% L-arabinose, while cultures were cooled to 16°C and cultivation was continued overnight. Cells were harvested by centrifugation (10 min, 5,500 × g, 4°C), resuspended in Tris-HCl (20 mM, pH 7.5), and lysed on ice using sonication (8 × 30 s). Cell debris was collected at 14,000 × g, 4°C, and the supernatant was added to Ni-NTA columns, and His-tagged proteins were purified according to the Protino Ni-TED protein purification kit (Macherey-Nagel, Düren, Germany).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Luria SE, Human ML. 1952. A nonhereditary, host-induced variation of bacterial viruses. J Bacteriol 64:557–569. doi:10.1128/jb.64.4.557-569.195212999684 PMC 169391 · doi ↗ · pubmed ↗
- 2Weigele P, Raleigh EA. 2016. Biosynthesis and function of modified bases in bacteria and their viruses. Chem Rev 116:12655–12687. doi:10.1021/acs.chemrev.6b 0011427319741 · doi ↗ · pubmed ↗
- 3Lee Y-J, Weigele PR. 2021. Detection of modified bases in bacteriophage genomic DNA. Methods Mol Biol 2198:53–66. doi:10.1007/978-1-0716-0876-0_532822022 · doi ↗ · pubmed ↗
- 4Tourancheau A, Mead EA, Zhang X-S, Fang G. 2021. Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing. Nat Methods 18:491–498. doi:10.1038/s 41592-021-01109-333820988 PMC 8107137 · doi ↗ · pubmed ↗
- 5Xu L, Seki M. 2020. Recent advances in the detection of base modifications using the Nanopore sequencer. J Hum Genet 65:25–33. doi:10.1038/s 10038-019-0679-031602005 PMC 7087776 · doi ↗ · pubmed ↗
- 6Wallace EVB, Stoddart D, Heron AJ, Mikhailova E, Maglia G, Donohoe TJ, Bayley H. 2010. Identification of epigenetic DNA modifications with a protein nanopore. Chem Commun (Camb) 46:8195–8197. doi:10.1039/c 0cc 02864 a 20927439 PMC 3147113 · doi ↗ · pubmed ↗
- 7Rand AC, Jain M, Eizenga JM, Musselman-Brown A, Olsen HE, Akeson M, Paten B. 2017. Mapping DNA methylation with high-throughput nanopore sequencing. Nat Methods 14:411–413. doi:10.1038/nmeth.418928218897 PMC 5704956 · doi ↗ · pubmed ↗
- 8Yaung SJ, Esvelt KM, Church GM. 2014. CRISPR/Cas 9-mediated phage resistance is not impeded by the DNA modifications of phage T 4. P Lo S One 9:e 98811. doi:10.1371/journal.pone.009881124886988 PMC 4041780 · doi ↗ · pubmed ↗
