Maximum-Variance-Reduction Stratification for Improved Subsampling
Dingyi Wang, Haiying Wang, and Qingpei Hu

TL;DR
This paper introduces MVRS, a stratification method that enhances subsampling efficiency by reducing estimator variance with minimal additional computational cost.
Contribution
The paper proposes a novel stratification mechanism, MVRS, that improves estimation efficiency in subsampling by targeting asymptotic variance reduction.
Findings
MVRS significantly reduces estimator variance in experiments.
MVRS improves accuracy over existing subsampling methods.
MVRS incurs only linear additional computational cost.
Abstract
Subsampling is a widely used and effective approach for addressing the computational challenges posed by massive datasets. Substantial progress has been made in developing non-uniform, probability-based subsampling schemes that prioritize more informative observations. We propose a novel stratification mechanism that can be combined with existing subsampling designs to further improve estimation efficiency. We establish the estimator's asymptotic normality and quantify the resulting efficiency gains, which enables a principled procedure for selecting stratification variables and interval boundaries that target reductions in asymptotic variance. The resulting algorithm, Maximum-Variance-Reduction Stratification (MVRS), achieves significant improvements in estimation efficiency while incurring only linear additional computational cost. MVRS is applicable to both non-uniform and uniform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
