Out-of-Order Sliding-Window Aggregation with Efficient Bulk Evictions and Insertions (Extended Version)
Kanat Tangwongsan, Martin Hirzel, Scott Schneider

TL;DR
This paper presents a novel algorithm for sliding-window aggregation that efficiently handles bulk evictions and insertions, improving theoretical complexity and practical performance for out-of-order and bursty data streams.
Contribution
It introduces the first algorithms capable of efficiently managing bulk evictions and insertions in sliding-window aggregation, extending applicability to bursty, out-of-order data streams.
Findings
Algorithm matches best previous complexity for single insert/evict cases.
Improves theoretical complexity for bulk evict cases.
Outperforms naive approaches in practice for bulk insertions.
Abstract
Sliding-window aggregation is a foundational stream processing primitive that efficiently summarizes recent data. The state-of-the-art algorithms for sliding-window aggregation are highly efficient when stream data items are evicted or inserted one at a time, even when some of the insertions occur out-of-order. However, real-world streams are often not only out-of-order but also burtsy, causing data items to be evicted or inserted in larger bulks. This paper introduces a new algorithm for sliding-window aggregation with bulk eviction and bulk insertion. For the special case of single insert and evict, our algorithm matches the theoretical complexity of the best previous out-of-order algorithms. For the case of bulk evict, our algorithm improves upon the theoretical complexity of the best previous algorithm for that case and also outperforms it in practice. For the case of bulk insert,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Database Systems and Queries · Data Management and Algorithms
