Data stream fusion for accurate quantile tracking and analysis
Massimo Cafaro, Catiuscia Melle, Italo Epicoco, Marco Pulimeno

TL;DR
This paper introduces a mergeable version of UDDSKETCH, a data sketching algorithm for accurate quantile tracking in data streams, enabling efficient parallel processing with improved accuracy over existing methods.
Contribution
It presents a fully mergeable, parallel version of UDDSKETCH, with formal correctness proofs and extensive experiments demonstrating superior accuracy in quantile estimation.
Findings
Parallel UDDSKETCH outperforms parallel DDSKETCH in accuracy
UDDSKETCH provides distribution-independent quantile guarantees
The mergeability property enables efficient distributed data stream processing
Abstract
UDDSKETCH is a recent algorithm for accurate tracking of quantiles in data streams, derived from the DDSKETCH algorithm. UDDSKETCH provides accuracy guarantees covering the full range of quantiles independently of the input distribution and greatly improves the accuracy with regard to DDSKETCH. In this paper we show how to compress and fuse data streams (or datasets) by using UDDSKETCH data summaries that are fused into a new summary related to the union of the streams (or datasets) processed by the input summaries whilst preserving both the error and size guarantees provided by UDDSKETCH. This property of sketches, known as mergeability, enables parallel and distributed processing. We prove that UDDSKETCH is fully mergeable and introduce a parallel version of UDDSKETCH suitable for message-passing based architectures. We formally prove its correctness and compare it to a parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Target Tracking and Data Fusion in Sensor Networks · Data Stream Mining Techniques
