Model-Free Inference for Characterizing Protein Mutations through a Coevolutionary Lens
Fan Yang, Zhao Ren, Wen Zhou, Kejue Jia, Robert Jernigan

TL;DR
This paper introduces a model-free statistical inference framework for protein contact prediction from MSA data, enabling uncertainty quantification and identification of amino acid contributions, advancing coevolutionary analysis.
Contribution
It develops a novel partial correlation-based testing framework for contact prediction that does not rely on traditional model assumptions.
Findings
Validates control of Type I errors in simulations
Demonstrates high power in numerical experiments
Successfully applied to real protein data
Abstract
Multiple sequence alignment (MSA) data play a crucial role in the study of protein mutations, with contact prediction being a notable application. Existing methods are often model-based or algorithmic and typically do not incorporate statistical inference to quantify the uncertainty of the prediction outcomes. To address this, we propose a novel framework that transforms the task of contact prediction into a statistical testing problem. Our approach is motivated by the partial correlation for continuous random variables. With one-hot encoding of MSA data, we are able to construct a partial correlation graph for multivariate categorical variables. In this framework, two connected nodes in the graph indicate that the corresponding positions on the protein form a contact. A new spectrum-based test statistic is introduced to test whether two positions are partially correlated. Moreover, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Genetic Associations and Epidemiology · Bioinformatics and Genomic Networks
