A natural upper bound to the accuracy of predicting protein stability changes upon mutations
Ludovica Montanucci, Pier Luigi Martelli, Nir Ben-Tal, Piero Fariselli

TL;DR
This paper establishes a theoretical upper limit on the accuracy of predicting protein stability changes upon mutations, considering data noise and distribution, revealing that current methods are close to this bound and that dataset differences can mislead comparisons.
Contribution
It introduces a novel analytical framework to estimate the maximum achievable prediction accuracy for protein stability changes based on data intrinsic properties.
Findings
The upper bound for prediction correlation is approximately 0.7-0.8.
Current predictors are near the theoretical performance limit.
Dataset noise and distribution significantly influence prediction accuracy.
Abstract
Accurate prediction of protein stability changes upon single-site variations (DDG) is important for protein design, as well as our understanding of the mechanism of genetic diseases. The performance of high-throughput computational methods to this end is evaluated mostly based on the Pearson correlation coefficient between predicted and observed data, assuming that the upper bound would be 1 (perfect correlation). However, the performance of these predictors can be limited by the distribution and noise of the experimental data. Here we estimate, for the first time, a theoretical upper-bound to the DDG prediction performances imposed by the intrinsic structure of currently available DDG data. Given a set of measured DDG protein variations, the theoretically best predictor is estimated based on its similarity to another set of experimentally determined DDG values. We investigate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic Associations and Epidemiology · RNA and protein synthesis mechanisms
