Improving Count-Mean Sketch as the Leading Locally Differentially Private Frequency Estimator for Large Dictionaries
Mingen Pan

TL;DR
This paper enhances the Count-Mean Sketch algorithm for frequency estimation under local differential privacy, correcting previous errors, optimizing it with randomized response, and demonstrating its superior performance and efficiency for large dictionaries.
Contribution
It revises and optimizes the private CMS algorithm, proving its theoretical and empirical superiority as the leading LDP frequency estimator for large dictionaries.
Findings
Optimized CMS with randomized response outperforms variants in MSE, $l_1$, and $l_2$ losses.
Pairwise-independent hashing suffices, reducing communication costs.
Randomness is essential for CMS correctness, with unavoidable communication costs.
Abstract
This paper identifies that a group of latest locally-differentially-private (LDP) algorithms for frequency estimation, including all the Hadamard-matrix-based algorithms, are equivalent to the private Count-Mean Sketch (CMS) algorithm with different parameters. Therefore, we revisit the private CMS, correct errors in the original CMS paper regarding expectation and variance, modify the CMS implementation to eliminate existing bias, and optimize CMS using randomized response (RR) as the perturbation method. The optimized CMS with RR is shown to outperform CMS variants with other known perturbations in reducing the worst-case mean squared error (MSE), loss, and loss. Additionally, we prove that pairwise-independent hashing is sufficient for CMS, reducing its communication cost to the logarithm of the cardinality of all possible values (i.e., a dictionary). As a result, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
