Computational Pipeline for Targeted Integration and Variable Payload Expression in Bacteriophage Engineering
Jonas Fernbach, Emese Hegedis, Martin J. Loessner, Samuel Kilcher

TL;DR
This paper introduces a computational pipeline to improve bacteriophage engineering by predicting optimal sites for inserting genetic payloads, validated through bioluminescent reporter experiments.
Contribution
A novel computational pipeline for targeted integration of genetic payloads in phage genomes, using machine learning predictions and validated experimentally.
Findings
A method was developed to identify optimal intergenic loci for payload insertion using PhagePromoter predictions.
Three recombinant phages with reporter genes at distinct loci showed expression levels matching computational predictions.
The pipeline enables temporal control of gene expression corresponding to early, middle, or late gene clusters.
Abstract
Bacteriophages offer a promising alternative to conventional antimicrobials, especially when such treatments fail. While natural phages are viable for therapy, advances in synthetic biology allow precise genome modifications to enhance their therapeutic potential. One approach involves inserting antimicrobial genetic payloads into the phage genome. These are typically placed behind late-expressed genes, such as the major capsid gene (cps). However, phages engineered with toxic payloads often fail to produce viable progeny due to premature host shutdown. To broaden the scope of viable genetic insertion sites, we developed a method to identify intergenic loci with favorable expression profiles using the machine learning-based promoter prediction tool, PhagePromoter. Guided by these predictions, we designed a computationally assisted engineering pipeline for targeted genomic payload…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
1
2
3
4|
| Multi-fasta | Big Files | Promoter Core | Score or Probability | Output Format | Additional Notes | Citation |
|---|---|---|---|---|---|---|---|
| Bprom | No | Yes | Yes | Yes | Text on screen |
| |
| bTSS-finder | Yes | Yes | Yes | Yes | Text file | Sequence length limited to 500 bp. Not executable on MacOS. |
|
| BacPP | No | No | No | Yes | Text on screen |
| |
| Virtual Footprint | No | Yes | Yes | Yes | Text on screen | Requires.gb extension. Sequence length limited to 1000 bp. |
|
| IBBP | No | Yes | No | Yes | Text file | Windows only |
|
| iPro70-FMWin | Yes | Yes | No | Yes | Text on screen |
| |
| iPro70-PseZNC | Yes | Yes | No | Yes | Text on screen | Fasta input file requires.txt extension. |
|
| iPro54-PseKNC | Yes | Yes | No | Yes | Text on screen | Fasta input file requires .txt extension. |
|
| 70ProPred | Yes | Yes | No | No | Text on screen | No input file (has to be entered manually on webpage), |
|
| CNNProm | No | No | No | Yes | Text on screen |
| |
| MULTiPly | Yes | Yes | No | No | Text on screen | Slow for large data sets |
|
| iPromoter-2L | No | No | No | No | Text on screen |
| |
| Promoter-Hunter | No | No | Yes | Yes | Text on screen | Requires predefined motif matrix |
|
| SAPPHIRE | Yes | Yes | Yes | Yes | Text file |
| |
| XSTREME | Yes | Yes | Yes | Yes | Text file |
| |
| Phage-Promoter | Yes | Yes | Yes | Yes | Text file | Trained
on validated phage-encoded promoters, |
|
- —Schweizerischer Nationalfonds zur F??rderung der Wissenschaftlichen Forschung10.13039/501100001711
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacteriophages and microbial interactions · Genomics and Phylogenetic Studies · Bacterial Genetics and Biotechnology
Introduction
Bacteriophages, ubiquitous viruses with the innate capacity to infect and lyse bacteria, have long been considered potential alternatives to antibiotics. While antibiotics have largely supplanted bacteriophages in Western medicine, the escalating global prevalence of multidrug resistant bacterial infections necessitates urgent exploration of alternative treatments like phage therapy. ?−? ? ? ? Although numerous case studies demonstrate the efficacy of phages in treating antibiotic-resistant infections ?−? ? natural phages present challenges such as the rapid emergence of phage-resistant bacteria, ?,?,? immunogenicity during extended treatment,? and a restricted host range that can limit their applicability. To mitigate these challenges and augment the clinical utility of phages, various genetic engineering techniques have been devised to enhance their therapeutic efficacy. ?,? Specifically, the genetic “arming” of bacteriophages with antimicrobial effector payloads holds significant promise for advancing phage engineering in therapeutic contexts. ?,? A prevalent method for the precise, markerless integration of genetic payloads involves homologous recombination, which is commonly followed by either CRISPR-Cas9-assisted counterselection or plaque screening to identify recombinant phages.? The conventional approach places payloads downstream of structural genes that are highly expressed late in the infection cycle. Two such genes are the major capsid gene (cps) and holin/endolysin (ply) cassette. ?,? Although this method has proven useful in the past ?,?−? ? several caveats exist and could potentially be mitigated through a more diverse selection of insertion loci. Inserting payloads after the cps gene can inadvertently affect transcription and may not consistently yield viable phage progenya concern applicable to all insertion sites, particularly those that are uncharacterized or inaccurately annotated. Furthermore, these locations restrict the temporal expression of the payload to the end of the infection cycle. ?,? In certain applications, early and rapid payload expression could be advantageous. For example, reporter phages, which express signaling payloads for pathogen detection in diagnostics, could benefit from swift payload expression for more rapid signal detection. Another imaginable approach is to increase payload effect through the rapid and high expression of intracellular-acting proteins early in the infection cycle. This enhancement could also bolster antimicrobial efficacy against bacterial hosts equipped with innate or adaptive defense mechanisms. Low or intermediate expression levels may also be better suited for payloads where phage amplification at the infection site is beneficial, but high toxicity of the expressed payload may lead to premature cell death and abortion of the infection cycle. Given these considerations, there is a need to explore insertion sites with variable expression levels throughout the infection cycle. Developing reliable methods for predicting these sites and their corresponding expression levels could significantly advance genome engineering and phage therapy.
Understanding the binding sites for transcription factors serves as an initial step in gene expression regulation and can be deduced from the genetic sequence. The growing sophistication of machine learning (ML) algorithms, particularly those that infer transcription factor and polymerase binding sites, coupled with the influx of new bacteriophage sequence data, supports the integration of this knowledge into phage engineering methodologies. This could enhance both the precision and flexibility of traditional phage engineering approaches. Here, we developed and assessed a tool that employs ML for promoter inference to predict payload expression at optimal insertion sites, integrating this approach into our existing phage-engineering pipeline. Utilizing homologous recombination (HR) and subsequent bioluminescent plaque screening, we engineered three Staphylococcus-targeting Kayvirus variants, each equipped with a bioluminescent nanoluciferase (nluc) reporter payload. These payloads were inserted at loci identified through our predictive pipeline. The well-studied biology of Staphylococcus phage K in the literature enabled us to align our computational predictions and experimental outcomes with preexisting gene expression data and validated promoter regions.? The engineered phages constructed in this study exhibited expression levels consistent with our predictions. This was true both for the absolute quantity of the expressed payload and the timing of expression. These findings were further corroborated by previously characterized temporal gene expression data for K. In summary, our newly developed tool enables quick identification of gene insertion sites, thereby offering a means to fine-tune payload expression levels in therapeutic phage scaffolds by identifying and leveraging variable promoter strengths.
Results
Comparison of Various Promoter Prediction Algorithms
A variety of algorithms exist to date which are designed to predict promoters from genomic sequence data. These range from simple, consensus motif-based approaches? to more complex, state-of-the art ML algorithms such as support vector machines (SVM), random forests (RF), and convolution neural networks (CNN).? Our approach is designed to be universally applicable to whole-genome sequences, irrespective of phage or host species. This universality required that our chosen method for promoter prediction meet several predefined parameters. To allow for high-throughput processing of multiple genomes simultaneously, the method of choice had to be applicable to complete genome sequences and compatible with, e.g., a multi-FASTA input format. While our pipeline is compatible with such inputs for large-scale analyses, in the present study we applied it only to the complete genome sequence of Staphylococcus phage K (GenBank: KF766114.1), which served as our experimental scaffold. Our approach is driven by the hypothesis that ML-based probability scores are inherently linked to promoter strengths. We therefore required machine-learning implementations which quantify the probability that a predicted promoter was correctly determined as such. A comprehensive overview of the algorithms evaluated for integration into our pipeline is presented in Table.
1: Promoter Prediction Software and Various Parameters Relevant for Seamless Integration into Our Pipeline
We detail each method’s specifications concerning our predetermined parameters, as well as additional factors which ruled out different methods. Ultimately, we opted for the recently developed algorithm PhagePromoter, which employs both support vector machine (SVM) and artificial neural network (ANN) methodologies.? This algorithm was the only candidate that met all our predefined criteria. PhagePromoter has the added benefit of being adapted toward the prediction of phage-specific promoters based on experimentally verified, phage-specific input training data.
Promoter Prediction and Determination of Suitable Genetic Loci
for Payload Insertion in Staphylococcus Phage K
We chose the well-characterized, strictly lytic Staphylococcus-infecting Kayvirus K as our engineering scaffold.? The annotated, complete sequence record was used (GenBank: KF766114.1). To maintain simplicity in our approach, we primarily concentrated our promoter search within intergenic regions. Acknowledging the potential for promoter regions to overlap with coding sequences, we expanded our search to include 50 base pairs at both ends of each intergenic region. The gene product gp141 of K, annotated as comprising two exons interspersed with genes gp142 and gp143, was considered as a single contiguous sequence to eliminate potential insertion sites between the exons (the same approach was applied to gp071, which contains the intron-encoded gp072 HNH endonuclease). We utilized the PhagePromoter Galaxy Docker build? for promoter predictions in intergenic regions. This approach revealed 303 promoters, with many achieving the maximum predictive score of 1 (Figure).
Promoter score distribution and weighting scheme. (A) Histogram of all predicted promoter scores across the phage K genome. (B) Original predicted promoter scores (blue line) were exponentially weighted to favor high-scoring promoters (orange line).
We classified regions of high promoter element density within the intergenic spacestermed Intergenic Promoter Regions (IPR)and formulated a scoring algorithm to predict downstream gene expression levels. Individual PhagePromoter scores were exponentially weighted to place more emphasis on high-scoring predictions (FigureB). The IPR score was calculated as the cumulative, weighted scores of all predicted promoters within that IPR. Previous studies have characterized the transcription landscape for K using quantitative mRNA sequencing data.? This allowed us to compare our computational predictions to experimental promoter data and gain insight into the validity of our approach. Significant overlap between the experimental data and computational promoter predictions was evident (FigureA). Our method included 71 of 83 of the promoters experimentally validated in ref ?.
Genome map of phage K, including promoter predictions and selected insertion sites. (A) K genome map with predicted intergenic promoter regions (IPRs) and computed promoter scores. Genes are colored based on previously determined temporal mRNA expression, and experimentally validated promoters are indicated with vertical dotted lines (adapted from ref ). All calculated IPRs are shown as vertical red bars (weighted scores for the 12 highest-scoring IPRs are depicted above/below). (B) Reporter gene (nluc) insertion sites were selected based on the highest-scoring IPRs that had a downstream gene suitable for insertion of the payload. IPRs with no downstream gene were excluded. One insertion site was selected for early-, middle-, and late-expressed genes. The CPS insertion site used in a previous study is indicated.
As posited in our initial hypothesis, the temporal variations in gene expression during an infection cycle could potentially influence the efficacy of specific payloads. The availability of temporal expression data for K? (FigureA,B) motivated us to select high-scoring IPRs for genomic regions expressed at different time points during infection. Genes expressed early, late or at an intermediate time during the infection cycle were determined using available data and the highest-scoring IPRs were identified for each temporal category. Only loci with a proximal downstream gene in the same orientation as the IPR were considered. This constraint was applied to ensure functional context for evaluating promoter activity and to increase biological interpretability of predictions. Insertion sites were selected immediately downstream of the first gene following an IPR. This resulted in 3 IPRs at nucleotide positions 229–518 (early gene), 37505–38806 (late gene) and 119132–119495 (middle gene) (FigureB).
Engineered K::nluc Variants Show Bioluminescence
Expression Corresponding to Predicted Promoter Strengths and Temporal Gene Expression Data
To engineer the three reporter phages that encode nluc at different IPRs (early, middle, late), we used an HR-mediated approach. To this end, we constructed homology donor plasmids (pEDITnluc_early_, pEDITnluc_middle_, pEDITnluc_late_) that guide site-specific recombination (FigureA). HR was performed during infection of the K-susceptible host RN4220 (FigureB). The resulting phage lysates contained a mixture of wildtype and recombinant phage. Recombinant phages were enriched by bioluminescence screening (FigureB,C), isolated, sequenced, and purified by CsCl ultracentrifugation to remove residual NLuc protein from the phage preparation.
Engineering and bioluminescence screening of recombinant K::nluc phages. (A) Schematic maps of plasmid constructs showing the genomic regions and homologous arms used for engineering recombinant K::nluc phages.(LHA = left homology arm; RHA = right homology arm). (B) Lysates obtained from homologous recombination (HR) were screened for escape mutants potentially containing the intended insertion. Lysates were diluted 1000-fold and evaluated for bioluminescence in liquid infection assays (initially 1 × 104 PFU/mL phages and 1 × 106 CFU/mL bacteria). For positive samples, this process was repeated twice. Bioluminescent plaques were then isolated using plaque assays, purified, and validated by Sanger sequencing to confirm correct nluc insertion. (C) Detection of bioluminescent plaques for phages K::nluc Early, K::nluc Mid, and K::nluc Late.
To quantify NLuc expression levels and kinetics, bioluminescence measurements were performed over a timespan of 140 min at 37 °C (FigureA).
Bioluminescence time course measurements. (A) Bioluminescence time course measurements for K::nluc Early, K::nluc Late, and K::nluc Mid, as well as the previously characterized K::nluc CPS, were obtained by calculating fold change in relative light units (RLU) compared to wild-type K infection. Initial conditions were OD600 = 0.01 and 1 × 108 PFU/mL phage titer. Infections were performed in S. aureus PSK at 37 °C for 140 min. Values are background-corrected. (B) Equivalent bioluminescence measurements at 25 °C over 1 h were performed to assess differences in expression timing. Data represent mean ± standard deviation from biological triplicates.
The observed expression levels aligned with our computational predictions. Specifically, the phage variant K::nluc Late exhibited the highest bioluminescence, registering an IPR score of 14.14, followed by K::nluc Mid (IPR score: 12.25) and K::nluc Early (IPR score: 9.02). We additionally measured bioluminescence of the previously engineered K::nluc CPS, ? which has the nluc gene inserted behind the major capsid protein coding gene (cps). The bioluminescence of K::nluc CPS was observed to be approximately 5 times higher than that of K::nluc Late, despite the prediction of only a single, short promoter element (ATAAAT, PhagePromoter score: 0.921, IPR score: 2.51) upstream of cps. Interestingly, an IPR with a higher score (IPR score: 6.406) was identified two genes upstream (1832 bp) of cps. This IPR coincides with a location of an experimentally validated promoter and could potentially be contributing to the elevated transcription levels observed (FigureA). The latency period preceding the onset of expression was short for all phage variants, and all phages produced detectable bioluminescence within the first 10 min of infection. To more clearly delineate kinetic differences at the initial stage of infection, the same assay was conducted at 25 °C for a duration of 1 hour (FigureB). K::nluc Early showed the fastest increase in bioluminescence, followed by K::nluc Mid and K::nluc Late with similar profiles. K::nluc CPS showed the longest time until onset of expression.
Discussion
Phage engineering offers a promising avenue for augmenting the therapeutic applications of naturally occurring phages. While the field has seen a surge in diverse genetic engineering techniques ?,?,?,? the challenge of selecting appropriate payload insertion sites within the phage genome remains underexplored. As mentioned previously, traditional approaches have largely relied on empirical methods, focusing on regions behind late-expressed structural genes such as the major capsid protein. ?,?−? ? Inspired by recent advancements in computational sequence analysis, including ML algorithms for promoter and transcription factor binding site identification, we aimed to explore alternative integration sites.
Promoters serve as pivotal elements in the regulation of gene expression, acting as short sequence motifs typically located upstream of the genes they regulate. These sequences facilitate the binding of σ70-RNA polymerase (RNAP) and initiate transcription at the transcription start site.? Research indicates that initiation rates and RNAP kinetics are influenced by a number of factors such as sequence context, adjacent motifs, and the overall genetic background. These factors can lead to variations in the kinetics of RNA polymerase (RNAP), affecting both its recruitment to the promoter region and its processivity during transcription elongation. Some studies have reported differences of up to 10,000-fold in these kinetic properties, depending on the specific promoter region involved. ?−? ? ? ? Promoters that closely align with the consensus sequence are generally associated with stronger RNAP affinity, a factor that can contribute to elevated transcription levels.? However, strong RNAP binding can sometimes lead to overstabilization of the initiation complex at the transcription start site, resulting in lower levels of gene expression.? This phenomenon has been observed primarily in genes with high baseline expression levels and for small, single nucleotide sequence variations.? Recent studies have also challenged traditional assumptions about promoter orientation and transcription initiation. One study shows an inherent bidirectionality of promoters, putting into question the traditional assumptions of promoter orientation and transcription initiation.? Interestingly, we frequently observed the co-occurrence of predicted promoter regions on both strands, which is possibly a result of this promoter-based DNA-sequence symmetry (FigureA). Moreover, the genomic characteristics of specific bacterial species can introduce additional complexities. For example, the low G/C content in Staphylococcus species may lead to an over prediction of TATA motifs, characteristic of the −10 promoter region. Additionally, purine-pyrimidine base preferences can influence the energetic stability of the DNA double helix, thereby affecting transcription independent of promoter sequence motif conservation.?
In the quest to identify optimal sites for genetic payload integration, we leveraged ML methodologies to augment traditional sequence analysis, i.e., manual inspection of genome annotations, gene orientation, and intergenic regions to heuristically select potential insertion sites. Specifically, we utilized PhagePromoter,? a tool designed for predicting phage promoters, which we selected after a comprehensive evaluation of existing promoter prediction methods (see Table). This tool was particularly advantageous for our study for several reasons. First, PhagePromoter is trained on a large data set comprising both bacterial and phage-encoded promoters, along with their corresponding gene expression levels. This extensive training data allows the algorithm to generate probability scores that serve as a proxy for promoter strength. This is in contrast to conventional, non-ML-based consensus methods, where the general distance to the consensus sequence of a promoter is often used as a metric. We hypothesized that the ML-generated probability scores could serve as a reliable indicator of promoter strength. However, our study has its limitations, primarily due to its inability to consider various complex factors influencing bacterial transcription regulation, such as transcription factors, RNA binding proteins, RNA secondary structure, RNAP regulators, intricate promoter architecture and late transcriptional regulators specific to phages infecting Gram-positive bacteria. ?−? ? ? ? ? ? The inaccuracies in predicting the expression levels of certain genes, like cps, highlight these limitations and suggest the need for further investigation to refine our ML-based promoter prediction model.
Another factor that may influence expression outcomes is whether a phage encodes its own RNA polymerase. While phage K relies entirely on host transcription machinery, other phages encode dedicated RNAPs that recognize distinct promoter motifs. PhagePromoter was trained on a diverse set of phage sequences irrespective of this distinction, which may partially account for such variation. Nevertheless, this represents an important consideration when applying the pipeline across different phage species and a valuable direction for future study.
To enhance the reliability of our predictions, we devised a scoring metric that weights individual promoter scores based on their probability. These weighted scores are then clustered into IPRs, providing a holistic approach aimed at reducing false positives. However, this method is still in its nascent stage and may exclude potentially viable insertion sites, constituting false negatives.
While we have benchmarked our approach using PhagePromoter, the pipeline is inherently modular, allowing for the integration of alternative or additional promoter prediction methods. This adaptability makes our pipeline a valuable asset for future high-throughput, automated phage engineering endeavors.
In this study, we leveraged the genetic manipulability of the well-established Staphylococcus phage K to integrate a bioluminescent nanoluciferase (nluc) reporter payload at selected loci, a strategy informed by previous successful applications of nluc as a reliable gene expression marker. ?,?,? This facilitated the development of recombinant phages K::nluc Early, K::nluc Mid, and K::nluc Late, each embodying the highest-scoring insertion sites within early, middle, and late gene clusters, respectively. Our bioluminescence time-course measurements revealed distinct kinetic profiles for these phages, aligning well with the anticipated temporal expression patterns of their respective gene clusters. ?,?,? These observations not only corroborate the predictions made by our ML-based promoter analysis but also underscore its potential as a robust tool in the rational design of phage engineering projects.
Our work highlights the synergistic potential of combining advanced ML algorithms with traditional sequence analysis to inform and enhance phage engineering strategies. The flexibility inherent in our modular pipeline facilitates the incorporation of diverse promoter prediction methods, allowing for adaptability and refinement in response to evolving analytical tools and techniques. We believe that the development of such integrative approaches is pivotal for advancing high-throughput, automated phage engineering, opening up possibilities for more accurate sequence-based expression predictions. This, in turn, empowers researchers to perform targeted payload insertions with heightened precision and reliability, paving the way for innovations in phage therapy and biotechnology.
Methods
Bacterial Strains and Culture Conditions
S. aureus laboratory strain RN4220 (DSM 26309) was used as an engineering host to generate all K::nluc variants. The propagation strain PSK, a laboratory derivative of S. aureus strain DSM 105272/DPC 5246 (O’Flaherty et al., 2004),? was used as host for phage K and all K::nluc variants. E. coli XL1-Blue MRF′ (Stratagene) was used for plasmid amplification prior to transformation of RN4220. E. coli were grown in Luria–Bertani (LB) liquid medium (10 g/L NaCl, 10 g/L tryptone, 5 g/L yeast extract) overnight at 37 °C (180 rpm). Staphylococcus species were grown in Brain Heart Infusion (BHI, Biolife Italiana) broth overnight at 37 °C (180 rpm).
DNA Amplification
Reaction mixes consisted of 19 μL H_2_O, 2.5 μL of forward and reverse primer respectively (100 μM stock, 2.5 μL used = 5 μM), 1 μL template and 25 μL 2× Phusion High-Fidelity PCR Master mix with HF buffer (Thermo Fisher). The template amount was varied in cases where fragment amplification proved difficult. PCR reactions were conducted with the following conditions: 5 min at 95 °C, 30 s at 95 °C, 30 s at annealing temperature, (30 s per 1000 bp) at 72 °C, 5 or 10 min at 72 °C. Steps 2 to 4 were repeated for 30–35 cycles. The annealing temperature was varied in cases where fragment amplification proved difficult. Gel electrophoresis was performed in 1% agarose gel (1% agarose in 1× TAE buffer) at 110 V. Samples were loaded as follows: 9 μL H_2_O, 2 μL 6× DNA loading dye (Thermo Fisher) and 1 μL PCR product. As DNA ladder, GeneRuler 1 kb DNA Ladder (Thermo Fisher Scientific) was used. The loading dye was supplemented with 100× GelRed Nucleic Acid Gel Stain (Biotium). Detailed primer information is provided in Table S1.
Plasmid Construction
Design of homologous recombination (HR) plasmids were done analogous to ref ?. The plasmid pLEB579 (kindly gifted by T. Takala, University of Helsinki, Finland) was used as a backbone. The pEDIT homology donor templates were constructed by integrating an nluc gene (optimized for S. aureus codon usage, avoiding of Rho-independent termination and an added upstream ribosome binding site: GAGGAGGTAAATATAT), flanked by homology arms corresponding to the intended K insertion site, into the linearized pLEB579 backbone. We chose the insertion site to be immediately downstream of the first gene following our predicted IPR. When the intergenic distance to the next gene (i.e., the second gene after the IPR) was below 20 base pairs (bp), we took measures to avoid disrupting the ribosomal binding site (RBS) of the downstream gene by artificially extending the intergenic region. This involved duplicating the required bases from the end of the preceding gene and inserting them right after the payload, thus preserving the integrity of the subsequent intergenic region. The homology arms were designed to be 200–400 bp in length and where possible to exclude full-length genes of unknown function to mitigate production of potentially toxic gene products during cloning. All synthetic sequences were acquired as GeneArt String DNA Fragments (Thermo Fisher), albeit the spacer sequences for pSELECT, which were ordered as GeneArt Gene Synthesis (Thermo Fisher). Strings and plasmid backbones were amplified by PCR and purified with the Wizard SV Gel and PCR Clean-up System (Promega). Plasmids were assembled using isothermal Gibson Assembly reaction (NEBuilder HiFi DNA Assembly Master Mix). DNA input was 20 ng per kb of fragment; for the 4 kb construct this corresponded to 80 ng total DNA. An overview of all primers and synthetic DNA strains used in this study is provided in Tables S1 and S2, respectively.
Transformation of E. coli Cells
To prepare electrocompetent E. coli, LB medium was inoculated with an overnight culture and grown until an OD_600_ of 0.4–0.6 was reached. The cells were incubated on ice for 30 min, centrifuged (4000 g, 15 min, 4 °C) and pellets resuspended in 10% glycerol. This was repeated three times and cells were stored at −80 °C if not used directly. The entire Gibson reaction (2–10 μL, depending on fragment purity; 80 ng DNA) was transformed into 100 μL electrocompetent E. coli cells at 2.5 kV, 200 Ω, 25 μF. One ml SOC medium (2% (w/v) Tryptone, 0.5% (w/v) Yeast extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl_2_, 20 mM Glucose) was added immediately and the cells in SOC were left at 37 °C for 15 min without agitation, then with agitation (300 rpm) for 1 h. Cells were grown overnight on selective medium at 37 °C and successful transformants isolated.
Transformation of S. aureus Cells
To generate electrocompetent cells, BHI medium was inoculated with S. aureus and grown until an OD_600_ = 1 was reached. Cells were left on ice for 15 min, then washed three times with cold H_2_O and two times with 10% glycerol (3000–5000 g, 10 min, 4 °C). The cells were stored at −80 °C if not used directly. For electroporation, electrocompetent cells were thawed for 5 min on ice and incubated for another 5 min at RT. They were centrifuged (5000 g, 5 min) and the pellet was resuspended in 70 μL electroporation buffer (10% glycerol, 0.5 M sucrose in H_2_O). Plasmid DNA (1–2 μL, 100–1000 ng/μL, purified from E. coli minipreps) was transferred onto MF-Millipore mixed cellulose ester (MCE) membrane filters (25 mm diameter, 0.8 μm pore size; Sigma-Aldrich) in H_2_O, removed after 10–20 min, and then added to 100 μL electrocompetent cells for electroporation at 2.1 kV, 100 Ω, 25 μF. This membrane-assisted desalting step was used to reduce carryover salts prior to electroporation. 1 mL B2 medium (10 g/L casein hydrolysate, 5 g/L d-glucose, 1 g/L potassium phosphate dibasic, 25 g/L NaCl, 25 g/L yeast extract) was added immediately and the cells in B2 were left at 37 °C for 2 h with agitation (300 rpm). The cells were then plated onto B2 plates supplemented with selective antibiotics and stored overnight at 37 °C. Successful transformants were isolated the following day.
Plasmid Sequencing and Evaluation
Sanger sequencing of PCR products (purified with Wizard SV Gel and PCR Cleanup System (Promega)) or plasmids (purified with GenELute Plasmid Miniprep Kit (Sigma-Aldrich) was conducted at Microsynth AG (Switzerland). Results were analyzed with CLC Genomics Workbench.
Phage Handling and Storage
Phages were stored in S/M buffer (5.8 g/L NaCl, 2 g/L MgSO_4_, 6 g/L Tris-HCl (pH 7.5)) at 4 °C. To pick individual plaques from plates, wide bore pipet tips were used to transfer a small amount of the plaque-containing soft-agar into 100 μL S/M buffer.
Spot-On-Lawn and Full Plate Overlay Assays
To perform spot-on-lawn assays, 200 μL of stationary phase bacterial culture were added to 5 mL molten (47 °C) LC soft agar (10 g/L tryptone, 7.5 g/L NaCl, 1% d-glucose, 2 mM MgSO_4_, 10 mM CaCl_2_, 5 g/L yeast extract, 4 g/L agar), vortexed briefly and poured onto a 0.5× BHI plate. The soft agar was left to dry for at least 20 min at RT. 10 μL spots of 10-fold decreasing phage dilutions were placed onto the agar. The spots were left to dry for 1 h and then stored overnight at 37 °C. Full plate overlay assays were performed in a similar manner, with the difference that for each phage dilution, 10 μL were added to 5 mL molten LC soft agar together with the host bacteria and then poured onto a 0.5× BHI agar plate. The bacteria were added first, the LC soft agar tube was vortexed briefly, then the phages were added and transferred to the plate. The plates were dried for at least 30 min and incubated overnight at 37 °C.
Phage Propagation and Purification
To obtain a pure phage lysate, full overlays were performed with the phage on its host bacterium to obtain 3–6 plates with semiconfluent lysis. On the following day, 5 mL S/M buffer were added to each plate and placed at 4 °C for 2–3 h with gentle agitation. The lysate was then collected and centrifuged (10,000 g, 10 min, 4 °C) to pellet cell debris. The supernatant was sterilized by filtration using 0.22 μm filters. Open-Top Thinwall Ultra-Clear Tubes (Beckman Coulter) were loaded with the following CsCl density layers: 1.7 g/mL CsCl, 1.5 g/mL CsCl, 1.35 g/mL CsCl. The densities were obtained by dissolving CsCl salt in S/M buffer. As the topmost layer, the phage lysate sample adjusted to a density of 1.15 g/mL was added. Ultracentrifugation was performed at 20’700 rpm for 2 h at 10 °C. The tubes were then collected and the visible phage band was recovered. The lysate was dialyzed for a minimum of 2× 2 h using the Slide-A-Lyzer Mini Dialysis Device (Thermo Scientific).
Engineering of K::nluc
The engineering process of K::nluc through HR was conducted analogously to the methods outlined in ref ?. Briefly, the corresponding pEDIT plasmid for each engineered phage was transformed into S. aureus RN4220, creating the engineering strains RN4220 (pEDIT). These strains were then infected with serial dilutions of phage K using a soft-agar overlay method, with initial concentrations of 1 × 10^4^ PFU/ml phages and 1 × 10^6^ CFU/ml bacteria, to yield high titer lysates, obtained by washing semiconfluent plates with S/M buffer as previously described. Subsequent bioluminescence assays were conducted on the lysates using S. aureus PSK as the host. Lysates were diluted 1000-fold and evaluated for bioluminescence emission in liquid infection assays. In these assays, luminescence was typically detected in one out of ten wells, corresponding to an estimated recombinant frequency of approximately 1 in 10,000, which is in line with previous observations using CRISPR-Cas counterselection.? Positive samples underwent two additional rounds of this dilution and assessment process. This assay was iteratively performed until the detection of single, bioluminescent plaques on full-plate overlays was possible. Finally, individual bioluminescent plaques were isolated, purified, and Sanger sequenced to confirm the correct and precise insertion of the nanoluciferase gene.
Bioluminescence Assays
For expression analysis of the purified recombinant phages, stationary phase bacterial cultures were diluted to OD_600_ 0.01, inoculated with 1 × 10^8^ PFU/ml phage and incubated at 37 °C (180 rpm agitation). Bioluminescence measurements were taken by combining 25 μL of the sample solution with an equal volume of prepared buffer-reconstituted nluc substrate as detailed by the manufacturer (Nano-Glo Luciferase Assay System; Promega). Measurements were taken every 10 min in Nunc F96 MicroWell 446 plates using a GloMax navigator luminometer (Promega) with 5 s integration time and 2 s delay. To determine the background-corrected fold-change in relative light units (RLU) for each phage, measurements were normalized to a control reaction of each phage in BHI medium and the fold-change was calculated as the difference in RLU to an infection of the same strain with wildtype K. All measurements were performed in triplicate. To determine plaques containing recombinant, nanoluciferase-expressing phage from full plate overlays, 500 μL of Nano-GloR substrate were spread onto the plate. Plates were photographed using Gel Doc XR+ Gel Documentation System, once with no illumination and 50 s exposure time, and once again using trans-white illumination and exposure of 0.2 s. Images were overlaid to determine the location of plaques showing bioluminescence on the full plate. For those engineered phages where no bioluminescent plaques were detectable, enrichment steps were conducted by liquid infection of PSK at an initial MOI of 0.01 (1 × 10^4^ PFU/mL phages, 1 × 10^6^ CFU/mL bacteria) until a significant rise in bioluminescence (>10^3^ RLU) was detectable. The solution was then diluted 1:1000-fold and enrichment repeated. This was continued until bioluminescent plaques were detectable in the plaque assay.
Bioinformatics
Evaluation of the various promoter prediction programs was conducted with the software versions as they were online available in the time between 02.06.2022 and 08.06.2022. The full genome sequence and annotations of K were acquired from the GenBank file KF766114.1. For genes annotated as split by short internal open reading frames (e.g., intron-encoded HNH endonucleases), we treated the entire locus as a single continuous coding sequence to avoid false identification of intergenic promoter regions. This applied, for example, to gp141 and gp071 in the phage K genome. The PhagePromoter Galaxy Docker Build (Galaxy Version 0.1.0) was used for generating promoter predictions using the following parameters: Search both strands yes, threshold 0.5, phage family Myoviridae, host bacteria genus Staphylococcus aureus, phage type virulent. Programming, plotting and sequence analyses were done in Jupyter notebook version 6.1.4 running python 3.8.5. CLC Genomics Workbench version 20.0.4 was used for additional sequence analyses such as primer and string design as well as evaluation of sanger sequencing results. OpenAI’s ChatGPT 4? was used as a tool to assist with formatting and editing the manuscript. This involved iterative refinements to ensure clarity and conciseness of the content presented.
Supplementary Material
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Prestinaci F.Pezzotti P.Pantosti A.Antimicrobial resistance: a global multifaceted phenomenon Pathog. Global Health 201510930931810.1179/2047773215 Y.0000000030 PMC 476862326343252 · doi ↗ · pubmed ↗
- 2Aslam B.Wang W.Arshad M. I.Khurshid M.Muzammil S.Rasool M. H.Nisar M. A.Alvi R. F.Aslam M. A.Qamar M. U.Antibiotic resistance: a rundown of a global crisis Infect. Drug. Resist 201811164510.2147/IDR.S 17386730349322 PMC 6188119 · doi ↗ · pubmed ↗
- 3De Oliveira D. M.Forde B. M.Kidd T. J.Harris P. N.Schembri M. A.Beatson S. A.Paterson D. L.Walker M. J.Antimicrobial resistance in ESKAPE pathogens Clin. Microbiol. Rev 202033 e 00181–1910.1128/CMR.00181-1932404435 PMC 7227449 · doi ↗ · pubmed ↗
- 4Hatfull G. F.Dedrick R. M.Schooley R. T.Phage therapy for antibiotic-resistant bacterial infections Annu. Rev. Med 20227319721110.1146/annurev-med-080219-12220834428079 · doi ↗ · pubmed ↗
- 5Suh G. A.Lodise T. P.Tamma P. D.Knisely J. M.Alexander J.Aslam S.Barton K. D.Bizzell E.Totten K. M. C.Campbell J. L.Considerations for the use of phage therapy in clinical practice Antimicrob. Agents Chemother 202266 e 02071–2110.1128/aac.02071-2135041506 PMC 8923208 · doi ↗ · pubmed ↗
- 6Schooley R. T.Biswas B.Gill J. J.Hernandez-Morales A.Lancaster J.Lessor L.Barr J. J.Reed S. L.Rohwer F.Benler S.Development and use of personalized bacteriophage-based therapeutic cocktails to treat a patient with a disseminated resistant Acinetobacter baumannii infection Antimicrob. Agents Chemother 201761 e 00954–1710.1128/AAC.00954-1728807909 PMC 5610518 · doi ↗ · pubmed ↗
- 7Chan B. K.Turner P. E.Kim S.Mojibian H. R.Elefteriades J. A.Narayan D.Phage treatment of an aortic graft infected with Pseudomonas aeruginosa Evol. Med. Public Health 20182018606610.1093/emph/eoy 00529588855 PMC 5842392 · doi ↗ · pubmed ↗
- 8Dedrick R. M.Guerrero-Bustamante C. A.Garlena R. A.Russell D. A.Ford K.Harris K.Gilmour K. C.Soothill J.Jacobs-Sera D.Schooley R. T.Engineered bacteriophages for treatment of a patient with a disseminated drug-resistant Mycobacterium abscessus Nat. Med 20192573073310.1038/s 41591-019-0437-z 31068712 PMC 6557439 · doi ↗ · pubmed ↗
