TL;DR
This paper introduces a variational cross-validation framework for selecting and validating Markov state models in molecular kinetics, improving the accuracy of slow dynamical mode estimation by balancing systematic and statistical errors.
Contribution
It develops a new objective function, the generalized matrix Rayleigh quotient, and a variational theorem to guide model selection and prevent overfitting in MSMs.
Findings
The GMRQ bounds the slow mode eigenvalues, providing a theoretical basis.
Cross-validation detects and prevents overfitting in MSMs.
The method improves the accuracy of protein dynamics modeling.
Abstract
Markov state models (MSMs) are a widely used method for approximating the eigenspectrum of the molecular dynamics propagator, yielding insight into the long-timescale statistical kinetics and slow dynamical modes of biomolecular systems. However, the lack of a unified theoretical framework for choosing between alternative models has hampered progress, especially for non-experts applying these methods to novel biological systems. Here, we consider cross-validation with a new objective function for estimators of these slow dynamical modes, a generalized matrix Rayleigh quotient (GMRQ), which measures the ability of a rank- projection operator to capture the slow subspace of the system. It is shown that a variational theorem bounds the GMRQ from above by the sum of the first eigenvalues of the system's propagator, but that this bound can be violated when the requisite matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
