Count-min sketch with variable number of hash functions: an experimental study
\'Eric Fusy, Gregory Kucherov

TL;DR
This paper presents an experimental study of conservative Count-Min sketch, revealing its behavior under various conditions and demonstrating that assigning a variable number of hash functions can improve space efficiency and accuracy.
Contribution
It provides new experimental insights into conservative Count-Min's behavior and proposes a method to optimize space and error by varying the number of hash functions per element.
Findings
Count-Min sketch behavior varies with load factor and distribution.
Variable hash functions reduce space while maintaining low error.
Experimental results validate the effectiveness of the proposed approach.
Abstract
Conservative Count-Min, an improved version of Count-Min sketch [Cormode, Muthukrishnan 2005], is an online-maintained hashing-based data structure summarizing element frequency information without storing elements themselves. Although several works attempted to analyze the error that can be made by Count-Min, the behavior of this data structure remains poorly understood. In [Fusy, Kucherov 2022], we demonstrated that under the uniform distribution of input elements, the error of conservative Count-Min follows two distinct regimes depending on its load factor. In this work, we provide a series of experimental results providing new insights into the behavior of conservative Count-Min. Our contributions can be seen as twofold. On one hand, we provide a detailed experimental analysis of the behavior of Count-Min sketch in different regimes and under several representative probability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Algorithms and Data Compression · Advanced Image and Video Retrieval Techniques
