On the application of Good-Turing statistics to quantify convergence of biomolecular simulations
Panagiotis I. Koukos, Nicholas M. Glykos

TL;DR
This paper introduces a probability-based method using Good-Turing statistics to assess the convergence of biomolecular simulations by estimating the likelihood of unobserved configurations, providing a more rigorous convergence criterion.
Contribution
The authors adapt Good-Turing frequency estimation to molecular dynamics data, offering a novel, stable, and consistent approach for quantifying simulation convergence based on configuration sampling.
Findings
Method is computationally stable
Procedure is internally consistent
Applicable to various trajectories
Abstract
Quantifying convergence and sufficient sampling of macromolecular molecular dynamics simulations is more often than not a source of controversy (and of various ad hoc solutions) in the field. Clearly, the only reasonable, consistent and satisfying way to infer convergence (or otherwise) of a molecular dynamics trajectory must be based on probability theory. Ideally, the question we would wish to answer is the following : "What is the probability that a molecular configuration important for the analysis in hand has not yet been observed ?". Here we propose a method for answering a variant of this question by using the Good-Turing formalism for frequency estimation of unobserved species in a sample. Although several approaches may be followed in order to deal with the problem of discretizing the configurational space, for this work we use the classical RMSD matrix as a means to answering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
