SF-sketch: A Two-stage Sketch for Data Streams
Tong Yang, Lingtong Liu, Yibo Yan, Muhammad Shahzad, Yulong Shen,, Xiaoming Li, Bin Cui, Gaogang Xie

TL;DR
The SF-sketch is a novel two-stage probabilistic data structure for data streams that significantly improves accuracy and reduces memory usage while maintaining high processing speed, outperforming existing sketches like CM-sketch.
Contribution
We introduce the SF-sketch, a two-stage sketch with a small Slim-subsketch and a large Fat-subsketch, achieving higher accuracy and lower memory footprint than prior methods.
Findings
SF-sketch outperforms CM-sketch by up to 33.1 times in accuracy.
SF-sketch maintains high speed comparable to the best prior sketches.
Extensive evaluations demonstrate the effectiveness of SF-sketch over existing approaches.
Abstract
A sketch is a probabilistic data structure used to record frequencies of items in a multi-set. Sketches are widely used in various fields, especially those that involve processing and storing data streams. In streaming applications with high data rates, a sketch "fills up" very quickly. Thus, its contents are periodically transferred to the remote collector, which is responsible for answering queries. In this paper, we propose a new sketch, called Slim-Fat (SF) sketch, which has a significantly higher accuracy compared to prior art, a much smaller memory footprint, and at the same time achieves the same speed as the best prior sketch. The key idea behind our proposed SF-sketch is to maintain two separate sketches: a small sketch called Slim-subsketch and a large sketch called Fat-subsketch. The Slim-subsketch is periodically transferred to the remote collector for answering queries…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Caching and Content Delivery · Data Stream Mining Techniques
