Statistical uncertainty analysis for small-sample, high log-variance   data: Cautions for bootstrapping and Bayesian bootstrapping

Barmak Mostofian; Daniel M. Zuckerman

arXiv:1806.01998·stat.AP·March 27, 2019

Statistical uncertainty analysis for small-sample, high log-variance data: Cautions for bootstrapping and Bayesian bootstrapping

Barmak Mostofian, Daniel M. Zuckerman

PDF

2 Repos

TL;DR

This paper examines the limitations of bootstrapping methods for estimating uncertainty in small, high log-variance data, highlighting biases and recommending Bayesian bootstrap with caution for more reliable confidence intervals.

Contribution

It demonstrates the biases of standard bootstrap in high log-variance data and advocates for Bayesian bootstrap as a more consistent alternative, with detailed analysis and practical insights.

Findings

01

Standard bootstrap exhibits systematic bias in high log-variance data.

02

Bayesian bootstrap provides more reliable uncertainty intervals than standard bootstrap.

03

Both methods struggle with small sample sizes and high log-variance, underestimating the mean.

Abstract

Recent advances in molecular simulations allow the evaluation of previously unattainable observables, such as rate constants for protein folding. However, these calculations are usually computationally expensive and even significant computing resources may result in a small number of independent estimates spread over many orders of magnitude. Such small-sample, high "log-variance" data are not readily amenable to analysis using the standard uncertainty (i.e., "standard error of the mean") because unphysical negative limits of confidence intervals result. Bootstrapping, a natural alternative guaranteed to yield a confidence interval within the minimum and maximum values, also exhibits a striking systematic bias of the lower confidence limit in log space. As we show, bootstrapping artifactually assigns high probability to improbably low mean values. A second alternative, the Bayesian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.