Towards Solving the Inverse Protein Folding Problem
Yoojin Hong, Kyung Dae Ko, Gaurav Bhardwaj, Zhenhai Zhang, Damian B., van Rossum, and Randen L. Patterson

TL;DR
This paper demonstrates that structural sequence profiles derived from PSSMs outperform traditional homology-modeling algorithms in fold recognition, enabling rapid and scalable protein fold annotation at proteomic levels.
Contribution
The study introduces the use of structural sequence profiles for fold recognition, significantly improving accuracy over existing methods and proposing their application for large-scale protein annotation.
Findings
Structural sequence profiles outperform homology-modeling algorithms.
Profiles reconstitute SCOP fold classifications.
Profiles enable rapid proteomic-scale fold annotation.
Abstract
Accurately assigning folds for divergent protein sequences is a major obstacle to structural studies and underlies the inverse protein folding problem. Herein, we outline our theories for fold-recognition in the "twilight-zone" of sequence similarity (<25% identity). Our analyses demonstrate that structural sequence profiles built using Position-Specific Scoring Matrices (PSSMs) significantly outperform multiple popular homology-modeling algorithms for relating and predicting structures given only their amino acid sequences. Importantly, structural sequence profiles reconstitute SCOP fold classifications in control and test datasets. Results from our experiments suggest that structural sequence profiles can be used to rapidly annotate protein folds at proteomic scales. We propose that encoding the entire Protein DataBank (~1070 folds) into structural sequence profiles would extract…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Bioinformatics and Genomic Networks
