QPOPSS: Query and Parallelism Optimized Space-Saving for Finding Frequent Stream Elements
Victor Jarlow, Charalampos Stylianopoulos, Marina, Papatriantafilou

TL;DR
QPOPSS is a parallel algorithm for efficiently identifying frequent stream elements, offering strong concurrency guarantees, high accuracy, and minimal memory use, suitable for demanding real-time data analytics.
Contribution
It introduces QPOPSS, a novel parallel space-saving algorithm with concurrency guarantees, optimized for high throughput and accuracy in streaming data environments.
Findings
Linear scalability in multi-threaded throughput
Highest accuracy among compared methods
Significantly reduced memory footprint
Abstract
The frequent elements problem, a key component in demanding stream-data analytics, involves selecting elements whose occurrence exceeds a user-specified threshold. Fast, memory-efficient -approximate synopsis algorithms select all frequent elements but may overestimate them depending on (user-defined parameter). Evolving applications demand performance only achievable by parallelization. However, algorithmic guarantees concerning concurrent updates and queries have been overlooked. We propose Query and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency guarantees. The design includes an implementation of the \emph{Space-Saving} algorithm supporting fast queries, implying minimal overlap with concurrent updates. QPOPSS integrates this with the distribution of work and fine-grained synchronization among threads, swiftly balancing high throughput, high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Semantic Web and Ontologies
