On the Subbagging Estimation for Massive Data
Tao Zou, Xian Li, Xuan Liang, Hansheng Wang

TL;DR
This paper proposes a subbagging estimation method for large datasets that meets memory constraints, providing theoretical guarantees of consistency and normality, and demonstrating practical efficiency through simulations and real data analysis.
Contribution
It introduces a novel subbagging approach with a new theoretical framework for analyzing its properties under memory constraints, achieving $\, ext{sqrt}(N)$-consistency and asymptotic normality.
Findings
Subbagging estimator achieves $\, ext{sqrt}(N)$-consistency and asymptotic normality.
The asymptotic variance inflates by a factor of $1/\alpha$ compared to full sample estimator.
Simulation and real data analysis confirm the estimator's efficiency and accuracy.
Abstract
This article introduces subbagging (subsample aggregating) estimation approaches for big data analysis with memory constraints of computers. Specifically, for the whole dataset with size , subsamples are randomly drawn, and each subsample with a subsample size to meet the memory constraint is sampled uniformly without replacement. Aggregating the estimators of subsamples can lead to subbagging estimation. To analyze the theoretical properties of the subbagging estimator, we adapt the incomplete -statistics theory with an infinite order kernel to allow overlapping drawn subsamples in the sampling procedure. Utilizing this novel theoretical framework, we demonstrate that via a proper hyperparameter selection of and , the subbagging estimator can achieve -consistency and asymptotic normality under the condition $(k_Nm_N)/N\to \alpha \in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Fuzzy Systems and Optimization
