Estimation for Monotone Sampling: Competitiveness and Customization
Edith Cohen

TL;DR
This paper develops a framework for constructing admissible, order-optimal estimators for monotone sampling schemes, enabling customizable, efficient, and competitive data summaries for various data analysis tasks.
Contribution
It introduces a method to derive order-optimal estimators tailored to specific data priorities, including the novel L* and U* estimators, improving over traditional approaches.
Findings
L* estimator is 4-competitive and dominates Horvitz-Thompson estimator.
The estimators are easy to apply and customizable based on data priorities.
The framework enhances the effectiveness of monotone sampling in massive data analysis.
Abstract
Random samples are lossy summaries which allow queries posed over the data to be approximated by applying an appropriate estimator to the sample. The effectiveness of sampling, however, hinges on estimator selection. The choice of estimators is subjected to global requirements, such as unbiasedness and range restrictions on the estimate value, and ideally, we seek estimators that are both efficient to derive and apply and {\em admissible} (not dominated, in terms of variance, by other estimators). Nevertheless, for a given data domain, sampling scheme, and query, there are many admissible estimators. We study the choice of admissible nonnegative and unbiased estimators for monotone sampling schemes. Monotone sampling schemes are implicit in many applications of massive data set analysis. Our main contribution is general derivations of admissible estimators with desirable properties. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Process Monitoring · Bayesian Methods and Mixture Models · Statistical Methods in Clinical Trials
