Quantifying the uncertainty of molecular dynamics simulations : Good-Turing statistics revisited
Vasiliki Tsampazi, Nicholas M. Glykos

TL;DR
This paper revisits Good-Turing statistics for molecular dynamics, introducing a memory-efficient variant that enables analysis of extremely long trajectories with millions of structures.
Contribution
A new linear-memory variant of the Good-Turing algorithm is proposed, allowing application to very long molecular dynamics simulations.
Findings
The new method produces results consistent with the original implementation.
It can handle trajectories with up to 22 million structures.
The algorithm is available as a computer program for public use.
Abstract
We have previously shown that Good-Turing statistics can be applied to molecular dynamics trajectories to estimate the probability of observing completely new (thus far unobserved) biomolecular structures, and showed that the method is stable, dependable and its predictions verifiable. The major problem with that initial algorithm was the requirement for calculating and storing in memory the two-dimensional RMSD matrix of the currently available trajectory. This requirement precluded the application of the method to very long simulations. Here we describe a new variant of the Good-Turing algorithm whose memory requirements scale linearly with the number of structures in the trajectory, making it suitable even for extremely long simulations. We show that the new method gives essentially identical results with the older implementation, and present results obtained from trajectories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Computational Drug Discovery Methods · Gene Regulatory Network Analysis
