Stream Aggregation Through Order Sampling

Nick Duffield; Yunhong Xu; Liangzhen Xia; Nesreen Ahmed; Minlan Yu

arXiv:1703.02693·cs.DS·November 2, 2017·1 cites

Stream Aggregation Through Order Sampling

Nick Duffield, Yunhong Xu, Liangzhen Xia, Nesreen Ahmed, Minlan Yu

PDF

Open Access

TL;DR

This paper introduces Priority-Based Aggregation (PBA), a novel single-pass stream aggregation algorithm that efficiently provides unbiased estimates of weighted sums over non-unique keys using order sampling.

Contribution

The paper presents PBA, the first algorithm to realize order sampling benefits in stream aggregation with non-unique keys, reducing computational complexity and improving accuracy.

Findings

01

Weighted relative error reduced by 40% to 65%.

02

Significant accuracy improvements over Adaptive Sample and Hold.

03

Efficient unbiased estimates of key aggregates in real-time.

Abstract

This is paper introduces a new single-pass reservoir weighted-sampling stream aggregation algorithm, Priority-Based Aggregation (PBA). While order sampling is a powerful and e cient method for weighted sampling from a stream of uniquely keyed items, there is no current algorithm that realizes the benefits of order sampling in the context of stream aggregation over non-unique keys. A naive approach to order sample regardless of key then aggregate the results is hopelessly inefficient. In distinction, our proposed algorithm uses a single persistent random variable across the lifetime of each key in the cache, and maintains unbiased estimates of the key aggregates that can be queried at any point in the stream. The basic approach can be supplemented with a Sample and Hold pre-sampling stage with a sampling rate adaptation controlled by PBA. This approach represents a considerable reduction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Machine Learning and Algorithms