Cardinality Estimators do not Preserve Privacy
Damien Desfontaines, Andreas Lochbihler, and David Basin

TL;DR
Cardinality estimators like HyperLogLog cannot simultaneously provide strong privacy guarantees and maintain their accuracy and aggregation properties, making them as sensitive as raw data.
Contribution
We formalize a privacy notion for cardinality estimators, demonstrate their incompatibility with strong privacy guarantees, and analyze existing algorithms' privacy risks.
Findings
Existing estimators leak significant privacy even with large multisets
Strong aggregation requirements conflict with privacy preservation
Proposed mitigation strategies for practical applications
Abstract
Cardinality estimators like HyperLogLog are sketching algorithms that estimate the number of distinct elements in a large multiset. Their use in privacy-sensitive contexts raises the question of whether they leak private information. In particular, can they provide any privacy guarantees while preserving their strong aggregation properties? We formulate an abstract notion of cardinality estimators, that captures this aggregation requirement: one can merge sketches without losing precision. We propose an attacker model and a corresponding privacy definition, strictly weaker than differential privacy: we assume that the attacker has no prior knowledge of the data. We then show that if a cardinality estimator satisfies this definition, then it cannot have a reasonable level of accuracy. We prove similar results for weaker versions of our definition, and analyze the privacy of existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
