DCA for genome-wide epistasis analysis: the statistical genetics perspective
Chen-Yi Gao, Fabio Cecconi, Angelo Vulpiani, Hai-Jun Zhou, Erik Aurell

TL;DR
This paper examines the applicability of Direct Coupling Analysis (DCA) in genome-wide epistasis studies, emphasizing its reliance on population genetics principles like Quasi-Linkage Equilibrium for meaningful results.
Contribution
It clarifies the conditions under which DCA is effective in genome analysis, linking statistical models to population genetic phases and comparing results with empirical data.
Findings
DCA is effective in populations in Quasi-Linkage Equilibrium.
DCA results align with correlation analyses in Streptococcus pneumoniae genomes.
Fundamental population genetics conditions influence DCA's applicability.
Abstract
Direct Coupling Analysis (DCA) is a now widely used method to leverage statistical information from many similar biological systems to draw meaningful conclusions on each system separately. DCA has been applied with great success to sequences of homologous proteins, and also more recently to whole-genome population-wide sequencing data. We here argue that the use of DCA on the genome scale is contingent on fundamental issues of population genetics. DCA can be expected to yield meaningful results when a population is in the Quasi-Linkage Equilibrium (QLE) phase studied by Kimura and others, but not, for instance, in a phase of Clonal Competition. We discuss how the exponential (Potts model) distributions emerge in QLE, and compare couplings to correlations obtained in a study of about 3,000 genomes of the human pathogen Streptococcus pneumoniae.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolution and Genetic Dynamics · Genetic diversity and population structure · Genetic Associations and Epidemiology
