Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information
Edith Cohen, Haim Kaplan

TL;DR
This paper develops a methodology for creating optimal unbiased estimators from sampled data across multiple instances, significantly improving the accuracy of query estimates in resource-constrained data processing scenarios.
Contribution
It introduces a general approach for deriving Pareto optimal unbiased estimators that leverage partial information, surpassing traditional methods like Horvitz-Thompson for multi-instance data.
Findings
Significant accuracy improvements in fundamental query estimates.
Enhanced estimators for common sampling schemes.
Better utilization of partial information in multi-instance data.
Abstract
Random sampling is an essential tool in the processing and transmission of data. It is used to summarize data too large to store or manipulate and meet resource constraints on bandwidth or battery power. Estimators that are applied to the sample facilitate fast approximate processing of queries posed over the original data and the value of the sample hinges on the quality of these estimators. Our work targets data sets such as request and traffic logs and sensor measurements, where data is repeatedly collected over multiple {\em instances}: time periods, locations, or snapshots. We are interested in queries that span multiple instances, such as distinct counts and distance measures over selected records. These queries are used for applications ranging from planning to anomaly and change detection. Unbiased low-variance estimators are particularly effective as the relative error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Stream Mining Techniques
