Automated design of collective variables using supervised machine learning
Mohammad M. Sultan, Vijay S. Pande

TL;DR
This paper introduces a data-driven method using supervised machine learning to automatically identify effective collective variables for enhanced sampling in molecular simulations, addressing a long-standing challenge in computational biophysics.
Contribution
It demonstrates how decision functions from supervised machine learning algorithms can serve as initial collective variables for accelerated sampling, providing a systematic approach for complex systems.
Findings
Support Vector Machine decision hyperplanes can guide sampling of slow transitions.
Logistic Regression probabilities effectively identify transition pathways.
Deep neural network classifiers can be used as CVs for complex molecular systems.
Abstract
Selection of appropriate collective variables for enhancing sampling of molecular simulations remains an unsolved problem in computational biophysics. In particular, picking initial collective variables (CVs) is particularly challenging in higher dimensions. Which atomic coordinates or transforms there of from a list of thousands should one pick for enhanced sampling runs? How does a modeler even begin to pick starting coordinates for investigation? This remains true even in the case of simple two state systems and only increases in difficulty for multi-state systems. In this work, we solve the initial CV problem using a data-driven approach inspired by the filed of supervised machine learning. In particular, we show how the decision functions in supervised machine learning (SML) algorithms can be used as initial CVs (SML_cv) for accelerated sampling. Using solvated alanine dipeptide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Protein Structure and Dynamics · Advanced Proteomics Techniques and Applications
MethodsLogistic Regression
