On the Optimal Number of Grids for Differentially Private Non-Interactive $K$-Means Clustering
Gokularam Muthukrishnan, Anshoo Tandon

TL;DR
This paper proposes a new principled method for choosing the grid size in differentially private non-interactive K-means clustering, improving accuracy and privacy trade-offs.
Contribution
It introduces a refined grid-size selection rule based on minimizing an upper bound on the expected deviation, differing from prior empirical tuning methods.
Findings
The new grid selection rule improves clustering accuracy under privacy constraints.
Numerical results show the method outperforms existing techniques, especially with tight privacy budgets.
Abstract
Differentially private -means clustering enables releasing cluster centers derived from a dataset while protecting the privacy of the individuals. Non-interactive clustering techniques based on privatized histograms are attractive because the released data synopsis can be reused for other downstream tasks without additional privacy loss. The choice of the number of grids for discretizing the data points is crucial, as it directly controls the quantization bias and the amount of noise injected to preserve privacy. The widely adopted strategy selects a grid size that is independent of the number of clusters and also relies on empirical tuning. In this work, we revisit this choice and propose a refined grid-size selection rule derived by minimizing an upper bound on the expected deviation in the K-means objective function, leading to a more principled discretization strategy for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
