Statistical challenges in the analysis of sequence and structure data for the COVID-19 spike protein
Shiyu He, Samuel W.K. Wong

TL;DR
This paper develops statistical models to analyze the evolution and structural changes of the SARS-CoV-2 spike protein, revealing how certain mutation combinations may spread more rapidly.
Contribution
It introduces Bayesian hierarchical models and sampling methods to study the temporal, spatial, and structural dynamics of spike protein mutations.
Findings
D614G variants are spreading widely.
Co-occurring mutations D614G with S477N or A222V spread faster.
Structural analysis suggests mutation impacts on 3-D conformation.
Abstract
As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences into representative clusters. We then apply sampling methods to investigate possible changes to the S-protein's 3-D structure as a result of commonly observed mutations. While the increasing spread of D614G variants has been noted in other research, our results also show that the co-occurring mutations of D614G together with S477N or A222V may spread even more rapidly, as quantified by our model estimates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSARS-CoV-2 and COVID-19 Research · vaccines and immunoinformatics approaches · Influenza Virus Research Studies
