Differentially Private Clustering in Data Streams
Alessandro Epasto, Tamalika Mukherjee, Peilin Zhong

TL;DR
This paper introduces the first differentially private algorithms for streaming $k$-means and $k$-median clustering, achieving sublinear space complexity and providing strong approximation guarantees while preserving data privacy.
Contribution
It presents novel differentially private streaming clustering algorithms with provable approximation guarantees and space efficiency, utilizing a new framework based on offline DP coresets.
Findings
Achieves $O(1)$-multiplicative approximation with sublinear space.
Provides $(1+eta)$-multiplicative approximation with adjustable space complexity.
Offers algorithms with polylogarithmic dependence on stream length $T$.
Abstract
Clustering problems (such as -means and -median) are fundamental unsupervised machine learning primitives, and streaming clustering algorithms have been extensively studied in the past. However, since data privacy becomes a central concern in many real-world applications, non-private clustering algorithms may not be as applicable in many scenarios. In this work, we provide the first differentially private algorithms for -means and -median clustering of -dimensional Euclidean data points over a stream with length at most using space that is sublinear (in ) in the continual release setting where the algorithm is required to output a clustering at every timestep. We achieve (1) an -multiplicative approximation with space and additive error, or (2) a -multiplicative approximation with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques
