Efficient data streaming multiway aggregation through concurrent algorithmic designs and new abstract data types
Vincenzo Gulisano, Yiannis Nikolakopoulos, Daniel Cederman, Marina, Papatriantafilou, Philippas Tsigas

TL;DR
This paper introduces new lock-free data structures and algorithms for efficient multiway data aggregation in streaming systems, significantly improving throughput and latency.
Contribution
It presents novel abstract data types and lock-free algorithms tailored for high-performance multiway aggregation in data streams, enabling better parallelism and efficiency.
Findings
Up to tenfold improvement in throughput and latency
Effective support for both order-sensitive and order-insensitive aggregates
Validated on large datasets from SoundCloud and Smart Grid networks
Abstract
Data streaming relies on continuous queries to process unbounded streams of data in a real-time fashion. It is commonly demanding in computation capacity, given that the relevant applications involve very large volumes of data. Data structures act as articulation points and maintain the state of data streaming operators, potentially supporting high parallelism and balancing the work between them. Prompted by this fact, in this work we study and analyze parallelization needs of these articulation points, focusing on the problem of streaming multiway aggregation, where large data volumes are received from multiple input streams. The analysis of the parallelization needs, as well as of the use and limitations of existing aggregate designs and their data structures, leads us to identify needs for proper shared objects that can achieve low-latency and high throughput multiway aggregation. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
