Temporal epistasis inference from more than 3,500,000 SARS-CoV-2 Genomic Sequences
Hong-Li Zeng, Yue Liu, Vito Dichio, Erik Aurell

TL;DR
This study applies Direct Coupling Analysis to over 3.5 million SARS-CoV-2 genomes to identify epistatic interactions, revealing stable long-range links and transient evolutionary effects over time.
Contribution
It demonstrates the use of DCA on a massive genomic dataset to infer epistasis and discusses the stability and temporal dynamics of inferred interactions.
Findings
DCA terms are more stable over time than correlations.
Correlations are enriched for phylogenetic effects and short-range dependencies.
DCA reveals long-range genomic interactions, especially involving Spike mutations.
Abstract
We use Direct Coupling Analysis (DCA) to determine epistatic interactions between loci of variability of the SARS-CoV-2 virus, segmenting genomes by month of sampling. We use full-length, high-quality genomes from the GISAID repository up to October 2021, in total over 3,500,000 genomes. We find that DCA terms are more stable over time than correlations, but nevertheless change over time as mutations disappear from the global population or reach fixation. Correlations are enriched for phylogenetic effects, and in particularly statistical dependencies at short genomic distances, while DCA brings out links at longer genomic distance. We discuss the validity of a DCA analysis under these conditions in terms of a transient Quasi-Linkage Equilibrium state. We identify putative epistatic interaction mutations involving loci in Spike.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic Mapping and Diversity in Plants and Animals · Evolution and Genetic Dynamics
