Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window
Ho-Leung Chan, Tak-Wah Lam, Lap-Kei Lee, and Hing-Fung Ting

TL;DR
This paper develops communication-efficient algorithms for monitoring distributed data streams over a sliding window, enabling accurate global statistics with minimal communication.
Contribution
It introduces novel algorithms with proven bounds for counting, frequent items, and quantiles in distributed streaming environments over sliding windows.
Findings
Achieves optimal communication costs for basic counting.
Provides algorithms with near-optimal bounds for quantiles and frequent items.
Establishes matching lower bounds for the proposed algorithms.
Abstract
The past decade has witnessed many interesting algorithms for maintaining statistics over a data stream. This paper initiates a theoretical study of algorithms for monitoring distributed data streams over a time-based sliding window (which contains a variable number of items and possibly out-of-order items). The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to be able to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is bits for basic counting and words for the remainings, where is the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Stream Mining Techniques · Data Management and Algorithms
