Unsupervised learning of dynamical and molecular similarity using variance minimization
Brooke E. Husic, Vijay S. Pande

TL;DR
This paper introduces an unsupervised learning approach that clusters molecular systems based on their dynamics or structures by minimizing variance, aiding in understanding protein mutations and preparing datasets for supervised learning.
Contribution
It presents a novel unsupervised clustering method using variance minimization applied to molecular dynamics and structural data, enhancing analysis and dataset partitioning.
Findings
Successfully clustered simulated tripeptides by dynamics using Jensen-Shannon divergence.
Extended the method to chemoinformatic datasets for structural similarity.
Provided a framework for dataset splitting to prevent overfitting in supervised learning.
Abstract
In this report, we present an unsupervised machine learning method for determining groups of molecular systems according to similarity in their dynamics or structures using Ward's minimum variance objective function. We first apply the minimum variance clustering to a set of simulated tripeptides using the information theoretic Jensen-Shannon divergence between Markovian transition matrices in order to gain insight into how point mutations affect protein dynamics. Then, we extend the method to partition two chemoinformatic datasets according to structural similarity to motivate a train/validation/test split for supervised learning that avoids overfitting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Computational Drug Discovery Methods
