An Iterative Scheme for Leverage-based Approximate Aggregation

Shanshan Han; Hongzhi Wang; Jialin Wan; Jianzhong Li

arXiv:1711.01960·cs.DB·January 23, 2019

An Iterative Scheme for Leverage-based Approximate Aggregation

Shanshan Han, Hongzhi Wang, Jialin Wan, Jianzhong Li

PDF

Open Access

TL;DR

This paper introduces an iterative leverage-based method for approximate data aggregation that achieves high accuracy with less data, suitable for big data scenarios, outperforming uniform sampling.

Contribution

The paper presents a novel leverage-based iterative scheme that improves aggregation accuracy using minimal data without needing to record sampled data.

Findings

01

Achieves high accuracy with only one-third of the sample size compared to uniform sampling.

02

Does not require recording sampled data, facilitating implementation.

03

Easily extends to online and various execution modes.

Abstract

The current data explosion poses great challenges to the approximate aggregation with an efficiency and accuracy. To address this problem, we propose a novel approach to calculate the aggregation answers with a high accuracy using only a small portion of the data. We introduce leverages to reflect individual differences in the samples from a statistical perspective. Two kinds of estimators, the leverage-based estimator, and the sketch estimator (a "rough picture" of the aggregation answer), are in constraint relations and iteratively improved according to the actual conditions until their difference is below a threshold. Due to the iteration mechanism and the leverages, our approach achieves a high accuracy. Moreover, some features, such as not requiring recording the sampled data and easy to extend to various execution modes (e.g., the online mode), make our approach well suited to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Data Mining Algorithms and Applications · Advanced Database Systems and Queries