Carbonyl4: A Sketch for Set-Increment Mixed Updates
Yikai Zhao, Yuhan Wu, Tong Yang

TL;DR
Carbonyl4 is a novel algorithm for set-increment mixed data streams that improves accuracy and adaptability through innovative techniques, outperforming existing methods in diverse datasets.
Contribution
It introduces two new techniques, Balance Bucket and Cascading Overflow, to enhance accuracy and adaptability in SIM data stream processing.
Findings
Outperforms existing algorithms in accuracy
Demonstrates robustness across diverse datasets
Features dynamic memory shrinking capabilities
Abstract
In the realm of data stream processing, the advent of SET-INCREMENT Mixed (SIM) data streams necessitates algorithms that efficiently handle both SET and INCREMENT operations. We present Carbonyl4, an innovative algorithm designed specifically for SIM data streams, ensuring accuracy, unbiasedness, and adaptability. Carbonyl4 introduces two pioneering techniques: the Balance Bucket for refined variance optimization, and the Cascading Overflow for maintaining precision amidst overflow scenarios. Our experiments across four diverse datasets establish Carbonyl4's supremacy over existing algorithms, particularly in terms of accuracy for item-level information retrieval and adaptability to fluctuating memory requirements. The versatility of Carbonyl4 is further demonstrated through its dynamic memory shrinking capability, achieved via a re-sampling and a heuristic approach. The source codes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Distributed systems and fault tolerance
