# Machine learning reveals sequence and methylation determinants of SaCas9–PAM interactions in bacteria

**Authors:** Dalton T Ham, Tyler S Browne, Claire Q Zhang, Gary W Foo, Aathavan S Uruthirapathy, Gregory B Gloor, David R Edgell

PMC · DOI: 10.1093/nar/gkaf1520 · Nucleic Acids Research · 2026-01-15

## TL;DR

This study uses machine learning to identify factors affecting SaCas9 activity in bacteria, including DNA sequence and methylation patterns.

## Contribution

The study introduces a machine learning model that reveals how DNA sequence and methylation influence SaCas9 activity in bacteria.

## Key findings

- T-rich dinucleotides near the PAM site correlate with higher SaCas9 activity in bacteria.
- Adenine methylation at GATC motifs inhibits SaCas9 activity, as shown by plasmid cleavage assays.
- Avoiding methylated PAMs may be an evolutionary adaptation for SaCas9 to distinguish self from nonself DNA.

## Abstract

Cas9 nucleases defend bacteria against invading DNA and can be used with single guide RNAs (sgRNAs) as antimicrobials and genome-editing tools. However, bacterial applications are limited by incomplete knowledge of Cas9–target interactions. Here, we generated large-scale Staphylococcus aureus Cas9 (SaCas9)/sgRNA activity datasets in bacteria and trained a machine learning model (crispr macHine trAnsfer Learning) to predict SaCas9 activity. Incorporating downstream sequences flanking the canonical NNGRRN protospacer adjacent motif (PAM) at positions [+1] and [+2] improved predictive performance, with T-rich dinucleotides at these positions correlating with higher in vivo activity. Crucially, SaCas9 showed \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
$\sim$\end{document}10-fold reduced activity at sites containing a 5\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
$^{\prime}$\end{document}-NNGGAT[C]-3\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
$^\prime$\end{document} PAM [+1] sequence in pooled sgRNA experiments in Escherichia coli and Citrobacter rodentium. Plasmid cleavage assays in DNA adenine methyltransferase (DAM)-deficient E. coli confirmed that adenine methylation at GATC motifs inhibited SaCas9 activity. Removal of a DAM site within a PAM sequence enhanced cleavage, while introduction of a site reduced activity, directly linking adenine methylation to SaCas9 activity. These findings demonstrate that machine learning can uncover biologically relevant determinants of Cas9 activity. Avoidance of methylated PAMs may reflect an evolutionary adaptation by SaCas9 to discriminate self from nonself or to counter methylation as a phage and plasmid antirestriction strategy.

Graphical Abstract

## Linked entities

- **Species:** Staphylococcus aureus (taxon 1280), Escherichia coli (taxon 562), Citrobacter rodentium (taxon 67825)

## Full-text entities

- **Species:** Staphylococcus aureus (species) [taxon 1280], Escherichia coli (E. coli, species) [taxon 562], Citrobacter rodentium (species) [taxon 67825]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12805903/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12805903/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12805903/full.md

---
Source: https://tomesphere.com/paper/PMC12805903