Impact of phylogeny on structural contact inference from protein sequence data
Nicola Dietler, Umberto Lupo, Anne-Florence Bitbol

TL;DR
This study examines how phylogenetic relationships influence the accuracy of protein contact prediction methods, finding that global inference models are more robust to phylogenetic correlations than local methods, with implications for biological data analysis.
Contribution
The paper demonstrates that global inference methods like Potts models are more resilient to phylogenetic correlations in protein sequence data than local methods, supported by synthetic and natural data analyses.
Findings
Global methods outperform local methods in the presence of phylogeny.
Phylogenetic correlations can cause false positive contact predictions.
Early mutations in phylogeny lead to spurious contact signals.
Abstract
Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino-acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Bioinformatics and Genomic Networks · Genetic diversity and population structure
