Investigating Statistical Conditions of Coevolutionary Signals that Enable Algorithmic Predictions of Protein Partners
José Fiorote, João Alves, Letícia Stock, Werner Treptow

TL;DR
This paper explores how coevolutionary signals in protein sequences can be used to predict protein partners without relying on 3D structures.
Contribution
The study introduces a Markov stochastic model to predict protein partners using coevolutionary information and sequence data.
Findings
Algorithmic predictions of protein partners struggle when sequence numbers exceed 100.
Ignoring mismatches in similar sequences improves true-positive prediction rates.
The model distinguishes optimized solutions from degenerate ones using coevolutionary parameters.
Abstract
This study examines the statistical conditions of coevolutionary signals that allow algorithmic predictions of protein partners based on amino acid sequences rather than 3D structures. It introduces a Markov stochastic model that predicts the number of correct protein partners based on coevolutionary information. The model defines state probabilities using a Poisson mixture of normal distributions, with key parameters including the total number of protein sequences M, the coevolutionary information gap α, and variance σ02. The model suggests that algorithmic approaches that maximize coevolutionary information cannot effectively resolve partners in protein families with a large number of sequences M ≥ 100. The model shows that true-positive (TP) rates can be enhanced by disregarding mismatches among similar sequences. This approach allows a distinction, in terms of {α, σ02}, between…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Machine Learning in Bioinformatics · Microbial Metabolic Engineering and Bioproduction
