SF-sketch: A Two-stage Sketch for Data Streams

Tong Yang; Lingtong Liu; Yibo Yan; Muhammad Shahzad; Yulong Shen,; Xiaoming Li; Bin Cui; Gaogang Xie

arXiv:1701.04148·cs.DS·February 8, 2017·1 cites

SF-sketch: A Two-stage Sketch for Data Streams

Tong Yang, Lingtong Liu, Yibo Yan, Muhammad Shahzad, Yulong Shen,, Xiaoming Li, Bin Cui, Gaogang Xie

PDF

Open Access

TL;DR

The SF-sketch is a novel two-stage probabilistic data structure for data streams that significantly improves accuracy and reduces memory usage while maintaining high processing speed, outperforming existing sketches like CM-sketch.

Contribution

We introduce the SF-sketch, a two-stage sketch with a small Slim-subsketch and a large Fat-subsketch, achieving higher accuracy and lower memory footprint than prior methods.

Findings

01

SF-sketch outperforms CM-sketch by up to 33.1 times in accuracy.

02

SF-sketch maintains high speed comparable to the best prior sketches.

03

Extensive evaluations demonstrate the effectiveness of SF-sketch over existing approaches.

Abstract

A sketch is a probabilistic data structure used to record frequencies of items in a multi-set. Sketches are widely used in various fields, especially those that involve processing and storing data streams. In streaming applications with high data rates, a sketch "fills up" very quickly. Thus, its contents are periodically transferred to the remote collector, which is responsible for answering queries. In this paper, we propose a new sketch, called Slim-Fat (SF) sketch, which has a significantly higher accuracy compared to prior art, a much smaller memory footprint, and at the same time achieves the same speed as the best prior sketch. The key idea behind our proposed SF-sketch is to maintain two separate sketches: a small sketch called Slim-subsketch and a large sketch called Fat-subsketch. The Slim-subsketch is periodically transferred to the remote collector for answering queries…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Caching and Content Delivery · Data Stream Mining Techniques