TL;DR
This paper examines the limitations of bootstrapping methods for estimating uncertainty in small, high log-variance data, highlighting biases and recommending Bayesian bootstrap with caution for more reliable confidence intervals.
Contribution
It demonstrates the biases of standard bootstrap in high log-variance data and advocates for Bayesian bootstrap as a more consistent alternative, with detailed analysis and practical insights.
Findings
Standard bootstrap exhibits systematic bias in high log-variance data.
Bayesian bootstrap provides more reliable uncertainty intervals than standard bootstrap.
Both methods struggle with small sample sizes and high log-variance, underestimating the mean.
Abstract
Recent advances in molecular simulations allow the evaluation of previously unattainable observables, such as rate constants for protein folding. However, these calculations are usually computationally expensive and even significant computing resources may result in a small number of independent estimates spread over many orders of magnitude. Such small-sample, high "log-variance" data are not readily amenable to analysis using the standard uncertainty (i.e., "standard error of the mean") because unphysical negative limits of confidence intervals result. Bootstrapping, a natural alternative guaranteed to yield a confidence interval within the minimum and maximum values, also exhibits a striking systematic bias of the lower confidence limit in log space. As we show, bootstrapping artifactually assigns high probability to improbably low mean values. A second alternative, the Bayesian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
