A subsampled double bootstrap for massive data

Srijan Sengupta; Stanislav Volgushev; and Xiaofeng Shao

arXiv:1508.01126·stat.ME·August 6, 2015·1 cites

A subsampled double bootstrap for massive data

Srijan Sengupta, Stanislav Volgushev, and Xiaofeng Shao

PDF

Open Access

TL;DR

This paper introduces the subsampled double bootstrap, a computationally efficient resampling method for massive datasets that improves upon existing techniques like BLB in speed, coverage, and ease of use.

Contribution

It proposes a new subsampled double bootstrap method that is consistent, faster, and easier to implement than BLB and traditional bootstrap for large-scale data.

Findings

01

The method is consistent for independent and dependent data.

02

It outperforms BLB in running time and coverage.

03

Numerical simulations demonstrate its practical advantages.

Abstract

The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its feasibility is questionable even with modern parallel computing platforms. Recently Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BLB (Bag of Little Bootstraps) for massive data which is more computationally scalable with little sacrifice of statistical accuracy. Building on BLB and the idea of fast double bootstrap, we propose a new resampling method, the subsampled double bootstrap, for both independent data and time series data. We establish consistency of the subsampled double bootstrap under mild conditions for both independent and dependent cases. Methodologically, the subsampled double bootstrap is superior to BLB in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Risk and Portfolio Optimization · Monetary Policy and Economic Impact