Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments
Edith Cohen, Haim Kaplan, Subhabrata Sen

TL;DR
This paper introduces a new coordinated weighted sampling framework for estimating aggregates over multiple weight assignments, significantly improving accuracy over previous methods, and demonstrates its effectiveness through extensive empirical evaluation.
Contribution
The paper develops a novel sampling framework for multiple weight assignments, providing estimators that are much tighter than existing approaches.
Findings
Estimators are orders of magnitude more accurate.
Framework is effective across diverse datasets.
Significantly improves aggregate estimation accuracy.
Abstract
Many data sources are naturally modeled by multiple weight assignments over a set of keys: snapshots of an evolving database at multiple points in time, measurements collected over multiple time periods, requests for resources served at multiple locations, and records with multiple numeric attributes. Over such vector-weighted data we are interested in aggregates with respect to one set of weights, such as weighted sums, and aggregates over multiple sets of weights such as the difference. Sample-based summarization is highly effective for data sets that are too large to be stored or manipulated. The summary facilitates approximate processing queries that may be specified after the summary was generated. Current designs, however, are geared for data sets where a single {\em scalar} weight is associated with each key. We develop a sampling framework based on {\em coordinated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Management and Algorithms · Data Stream Mining Techniques
