The Big Data Bootstrap
Ariel Kleiner (UC Berkeley), Ameet Talwalkar (UC Berkeley), Purnamrita, Sarkar (UC Berkeley), Michael Jordan (UC Berkeley)

TL;DR
The paper introduces the Bag of Little Bootstraps (BLB), a computationally efficient method for estimator assessment suitable for large datasets and parallel computing, maintaining bootstrap's statistical properties.
Contribution
It proposes BLB, combining bootstrap and subsampling, with extensive empirical and theoretical validation for large-scale data analysis.
Findings
BLB is statistically correct and reliable.
BLB performs well on large datasets and in parallel computing environments.
Hyperparameter selection for BLB is effective and practical.
Abstract
The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we present the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality. BLB is well suited to modern parallel and distributed computing architectures and retains the generic applicability, statistical efficiency, and favorable theoretical properties of the bootstrap. We provide the results of an extensive empirical and theoretical investigation of BLB's behavior, including a study of its statistical correctness, its large-scale implementation and performance, selection of hyperparameters, and performance on real data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Statistical Methods and Inference
