Optimized Learned Count-Min Sketch
Kyosuke Nishishita, Atsuki Sato, Yusuke Matsui

TL;DR
This paper introduces OptLCMS, an improved learned Count-Min Sketch that partitions data, analytically optimizes parameters, and offers faster construction with theoretical error guarantees, matching the accuracy of prior learned methods.
Contribution
It proposes a novel partitioned approach with analytically derived parameters and dynamic programming optimization, reducing empirical tuning and providing theoretical error guarantees.
Findings
OptLCMS builds faster than existing learned CMS methods.
It achieves lower intolerable error probability.
OptLCMS maintains estimation accuracy comparable to LCMS.
Abstract
Count-Min Sketch (CMS) is a memory-efficient data structure for estimating the frequency of elements in a multiset. Learned Count-Min Sketch (LCMS) enhances CMS with a machine learning model to reduce estimation error under the same memory usage, but suffers from slow construction due to empirical parameter tuning and lacks theoretical guarantees on intolerable error probability. We propose Optimized Learned Count-Min Sketch (OptLCMS), which partitions the input domain and assigns each partition to its own CMS instance, with CMS parameters analytically derived for fixed thresholds, and thresholds optimized via dynamic programming with approximate feasibility checks. This reduces the need for empirical validation, enabling faster construction while providing theoretical guarantees under these assumptions. OptLCMS also allows explicit control of the allowable error threshold, improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Database Systems and Queries · Machine Learning and Data Classification
