A subsampled double bootstrap for massive data
Srijan Sengupta, Stanislav Volgushev, and Xiaofeng Shao

TL;DR
This paper introduces the subsampled double bootstrap, a computationally efficient resampling method for massive datasets that improves upon existing techniques like BLB in speed, coverage, and ease of use.
Contribution
It proposes a new subsampled double bootstrap method that is consistent, faster, and easier to implement than BLB and traditional bootstrap for large-scale data.
Findings
The method is consistent for independent and dependent data.
It outperforms BLB in running time and coverage.
Numerical simulations demonstrate its practical advantages.
Abstract
The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its feasibility is questionable even with modern parallel computing platforms. Recently Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BLB (Bag of Little Bootstraps) for massive data which is more computationally scalable with little sacrifice of statistical accuracy. Building on BLB and the idea of fast double bootstrap, we propose a new resampling method, the subsampled double bootstrap, for both independent data and time series data. We establish consistency of the subsampled double bootstrap under mild conditions for both independent and dependent cases. Methodologically, the subsampled double bootstrap is superior to BLB in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Risk and Portfolio Optimization · Monetary Policy and Economic Impact
