The Big Data Bootstrap

Ariel Kleiner (UC Berkeley); Ameet Talwalkar (UC Berkeley); Purnamrita; Sarkar (UC Berkeley); Michael Jordan (UC Berkeley)

arXiv:1206.6415·cs.LG·July 3, 2012·ICML·44 cites

The Big Data Bootstrap

Ariel Kleiner (UC Berkeley), Ameet Talwalkar (UC Berkeley), Purnamrita, Sarkar (UC Berkeley), Michael Jordan (UC Berkeley)

PDF

Open Access

TL;DR

The paper introduces the Bag of Little Bootstraps (BLB), a computationally efficient method for estimator assessment suitable for large datasets and parallel computing, maintaining bootstrap's statistical properties.

Contribution

It proposes BLB, combining bootstrap and subsampling, with extensive empirical and theoretical validation for large-scale data analysis.

Findings

01

BLB is statistically correct and reliable.

02

BLB performs well on large datasets and in parallel computing environments.

03

Hyperparameter selection for BLB is effective and practical.

Abstract

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we present the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality. BLB is well suited to modern parallel and distributed computing architectures and retains the generic applicability, statistical efficiency, and favorable theoretical properties of the bootstrap. We provide the results of an extensive empirical and theoretical investigation of BLB's behavior, including a study of its statistical correctness, its large-scale implementation and performance, selection of hyperparameters, and performance on real data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Statistical Methods and Inference