Sublime: Sublinear Error & Space for Unbounded Skewed Streams
Navid Eslami, Ioana O. Bercea, Rasmus Pagh, Niv Dayan

TL;DR
Sublime is a novel framework that enhances frequency estimation sketches for skewed data streams by dynamically adjusting counter sizes and quantities, significantly improving accuracy and memory efficiency.
Contribution
It introduces a general approach to adapt frequency sketches with dynamic counter elongation and expansion, addressing memory inefficiency and accuracy degradation in streaming workloads.
Findings
Sublime reduces memory usage under skewed workloads.
It improves estimation accuracy over existing sketches.
Sublime maintains competitive performance with better accuracy and memory tradeoffs.
Abstract
Modern stream processing systems often need to track the frequency of distinct keys in a data stream in real-time. Since maintaining exact counts can require a prohibitive amount of memory, many applications rely on compact, probabilistic data structures known as frequency estimation sketches to approximate them. However, mainstream frequency estimation sketches fall short in two critical aspects. First, they are memory-inefficient under skewed workloads because they use uniformly-sized counters to count the keys, thus wasting memory on storing the leading zeros of many small counts. Second, their estimation error deteriorates at least linearly with the length of the stream--which may grow indefinitely--because they rely on a fixed number of counters. We present Sublime, a framework that generalizes frequency estimation sketches to address these challenges. To reduce memory footprint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
