A Formal Analysis of the Count-Min Sketch with Conservative Updates
Younes Ben Mazziane (UCA, NEO, Inria), Sara Alouf (UCA, NEO, Inria),, Giovanni Neglia (UCA, NEO, Inria)

TL;DR
This paper provides a new theoretical analysis of the Count-Min Sketch with Conservative Updates, deriving bounds on estimation error that improve understanding and configuration of the algorithm for heavy-hitter detection.
Contribution
It introduces a novel analytical approach to derive bounds on CMS-CU's error, enhancing its theoretical understanding and practical configuration.
Findings
Derived new upper bounds on estimation error for CMS-CU
Improved accuracy estimates for heavy-hitter detection
Validated bounds on synthetic and real data traces
Abstract
Count-Min Sketch with Conservative Updates (CMS-CU) is a popular algorithm to approximately count items' appearances in a data stream. Despite CMS-CU's widespread adoption, the theoretical analysis of its performance is still wanting because of its inherent difficulty. In this paper, we propose a novel approach to study CMS-CU and derive new upper bounds on the expected value and the CCDF of the estimation error under an i.i.d. request process. Our formulas can be successfully employed to derive improved estimates for the precision of heavy-hitter detection methods and improved configuration rules for CMS-CU. The bounds are evaluated both on synthetic and real traces.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
