Optimal Subsampling Bootstrap for Massive Data
Yingying Ma, Chenlei Leng, Hansheng Wang

TL;DR
This paper introduces a new hyperparameter selection method for subsampling bootstrap techniques, optimizing accuracy and computational efficiency for large datasets, and demonstrates its effectiveness through simulations.
Contribution
It develops a closed-form hyperparameter tuning framework for subsampling bootstrap methods, enhancing their performance on massive data.
Findings
Optimal hyperparameters improve bootstrap accuracy
Framework reduces computational costs
Simulation results show performance gains
Abstract
The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive datasets due to the need to repeatedly resample the entire data. Therefore, several improvements to the bootstrap method have been made in recent years, which assess the quality of estimators by subsampling the full dataset before resampling the subsamples. Naturally, the performance of these modern subsampling methods is influenced by tuning parameters such as the size of subsamples, the number of subsamples, and the number of resamples per subsample. In this paper, we develop a novel hyperparameter selection methodology for selecting these tuning parameters. Formulated as an optimization problem to find the optimal value of some measure of accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Data Classification · Gaussian Processes and Bayesian Inference
