Sampling with replacement vs Poisson sampling: a comparative study in optimal subsampling
Jing Wang, Jiahui Zou, HaiYing Wang

TL;DR
This paper compares subsampling with replacement and Poisson sampling, analyzing their estimation efficiency and providing optimal probabilities to improve subsampling methods for large data sets.
Contribution
It offers a rigorous theoretical comparison of the two sampling procedures and derives optimal subsampling probabilities for improved estimation efficiency.
Findings
Poisson subsampling may have higher estimation efficiency.
Optimal subsampling probabilities minimize estimator variance.
Algorithms based on optimal probabilities are effective in practice.
Abstract
Faced with massive data, subsampling is a commonly used technique to improve computational efficiency, and using nonuniform subsampling probabilities is an effective approach to improve estimation efficiency. For computational efficiency, subsampling is often implemented with replacement or through Poisson subsampling. However, no rigorous investigation has been performed to study the difference between the two subsampling procedures such as their estimation efficiency and computational convenience. This paper performs a comparative study on these two different sampling procedures. In the context of maximizing a general target function, we first derive asymptotic distributions for estimators obtained from the two sampling procedures. The results show that the Poisson subsampling may have a higher estimation efficiency. Based on the asymptotic distributions for both subsampling with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms · Water Systems and Optimization
