Replication of SARS-CoV-2 mutation analysis suggests differences in per-protein mutation characteristics
William Svoboda, Brendan McManamon, Sara Schwartz

TL;DR
This study replicates previous SARS-CoV-2 mutation analysis, confirming key mutation patterns and highlighting differences in mutation characteristics across viral proteins, especially the spike protein.
Contribution
It provides a partial replication of earlier mutation findings and offers new insights into factors influencing mutation rates per protein.
Findings
Spike protein accounts for about 24% of mutations
Mutation rates vary across proteins beyond length considerations
Replication confirms key mutation patterns in SARS-CoV-2
Abstract
The increasing spread of COVID-19, caused by the virus SARS-CoV-2, raises concerns about the extent to which mutations have occurred across the viral genome. We present a partial replication of an earlier 2021 study by Wang, R. et al. that determined the presence of four substrains and eleven top mutations in the United States. We analyze a portion of the authors' data set in order to recreate Figure S1 from the paper, recapitulating the same features observed in the original figure. We further generate a summary of mutation characteristics for each of the 26 named proteins and confirm the significance of the spike protein at roughly 24% of all recorded mutations. Our analysis suggests that additional factors may affect per-protein mutation rate besides protein length.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Genomics and Rare Diseases · Protein Degradation and Inhibitors
