TL;DR
This paper introduces an unsupervised Bayesian classifier for identifying protein conformational states from structural data, achieving high accuracy and adaptable detail levels, with broad applicability in structural biology.
Contribution
It adapts the naive Bayes classifier for protein structure analysis, introducing a novel entropy-based feature to vary structural detail without losing classification accuracy.
Findings
Achieves >95% accuracy in identifying conformational transitions
Demonstrates robustness across different atom counts and sampling levels
Provides a transparent, extendable Bayesian framework for structural classification
Abstract
Automated identification of protein conformational states from simulation of an ensemble of structures is a hard problem because it requires teaching a computer to recognize shapes. We adapt the naive Bayes classifier from the machine learning community for use on atom-to-atom pairwise contacts. The result is an unsupervised learning algorithm that samples a `distribution' over potential classification schemes. We apply the classifier to a series of test structures and one real protein, showing that it identifies the conformational transition with > 95% accuracy in most cases. A nontrivial feature of our adaptation is a new connection to information entropy that allows us to vary the level of structural detail without spoiling the categorization. This is confirmed by comparing results as the number of atoms and time-samples are varied over 1.5 orders of magnitude. Further, the method's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
