ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae
Phillip E. Davis, Joseph A. Russell

TL;DR
This paper shows that codon frequencies in the ORF1ab gene can predict whether a coronavirus is a human pathogen.
Contribution
A novel codon frequency model using ORF1ab data achieves high precision and recall in predicting human-pathogen status in Orthocoronavirinae.
Findings
ORF1ab codon frequency models achieved 76.74% precision and 85.96% recall in predicting human-pathogen status.
Five specific codons were identified as critical features for model performance.
Alternative models using other viral sequences or features performed poorly in comparison.
Abstract
Predicting phenotypic properties of a virus directly from its sequence data is an attractive goal for viral epidemiology. Here, we focus narrowly on the Orthocoronavirinae clade and demonstrate models that are powerfully predictive for a human-pathogen phenotype with 76.74% average precision and 85.96% average recall on the withheld test set groups, using only Orf1ab codon frequencies. We show alternative examples for other viral coding sequences and feature representations that do not perform well and discuss what distinguishes the models that are performant. These models point to a small subset of features, specifically 5 codons, that are critical to the success of the models. We discuss and contextualize how this observation may fit within a larger model for the role of translation in virus-host agreement.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies · interferon and immune responses
