Estimating Aggregate Properties on Probabilistic Streams
Andrew McGregor, S. Muthukrishnan

TL;DR
This paper introduces new streaming algorithms for estimating aggregate properties like average and distinct counts on probabilistic data streams, with proven accuracy and space efficiency, advancing analysis of uncertain or post-processed data.
Contribution
It presents the first one-pass streaming algorithms for average and distinct count estimation on probabilistic streams, extending to other aggregates with accuracy guarantees.
Findings
First known one-pass algorithm for average estimation on probabilistic streams.
First known streaming algorithms for counting distinct items in probabilistic streams.
Algorithms operate within space constraints with provable accuracy.
Abstract
The probabilistic-stream model was introduced by Jayram et al. \cite{JKV07}. It is a generalization of the data stream model that is suited to handling ``probabilistic'' data where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over potentially a very large number of classical "deterministic" streams where each item is deterministically one of the domain values. The probabilistic model is applicable for not only analyzing streams where the input has uncertainties (such as sensor data streams that measure physical processes) but also where the streams are derived from the input data by post-processing, such as tagging or reconciling inconsistent and poor quality data. We present streaming algorithms for computing commonly used aggregates on a probabilistic stream. We present the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Stream Mining Techniques
