On the performance of combined dichotomic predictors of natively unfolded proteins
Antonio Deiana, Andrea Giansanti

TL;DR
This study evaluates the performance of combined dichotomic predictors for identifying natively unfolded proteins, introduces a new unanimous scoring method, and finds that unclassified proteins are often false predictions, with implications across different kingdoms.
Contribution
The paper introduces a strictly unanimous score S_{SU} for combining predictors and demonstrates its effectiveness in improving prediction accuracy for natively unfolded proteins.
Findings
Unclassified proteins are mainly false predictions.
Performance improves when ambiguous proteins are removed.
Scaling law relates unfolded proteins to genome size.
Abstract
The performance of single folding predictors and combination scores is critically evaluated. We test mean packing, mean pairwise energy and the new index gVSL2 on a dataset of 743 folded proteins and 81 natively unfolded proteins. These predictors have an individual performance comparable or even better than other proposed methods. We introduce here a strictly unanimous score S_{SU} that combines them but leaves undecided those sequences differently classified by two single predictors. The performance of the single predictors on a dataset purged from the proteins left unclassified by S_{SU}, significantly increases, indicating that unclassified proteins are mainly false predictions. Amino acid composition is the main determinant considered by these predictors, therefore unclassified proteins have a composition compatible with both folded and unfolded status. This is why purging a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
