Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees
Jacob Imola, Alessandro Epasto, Mohammad Mahdian, Vincent Cohen-Addad,, Vahab Mirrokni

TL;DR
This paper explores differentially private algorithms for hierarchical clustering, establishing theoretical bounds, proposing approximation algorithms, and demonstrating their effectiveness on stochastic block models and real data.
Contribution
It introduces the first differentially private approximation algorithms for hierarchical clustering with provable guarantees and analyzes their performance under various models.
Findings
Lower bounds show any $psilon$-DP algorithm incurs $O(|V|^2/ psilon)$ error
A polynomial-time algorithm achieves $O(|V|^{2.5}/ psilon)$ error
Private algorithms successfully recover block structures in stochastic block models
Abstract
Hierarchical Clustering is a popular unsupervised machine learning method with decades of history and numerous applications. We initiate the study of differentially private approximation algorithms for hierarchical clustering under the rigorous framework introduced by (Dasgupta, 2016). We show strong lower bounds for the problem: that any -DP algorithm must exhibit -additive error for an input dataset . Then, we exhibit a polynomial-time approximation algorithm with -additive error, and an exponential-time algorithm that meets the lower bound. To overcome the lower bound, we focus on the stochastic block model, a popular model of graphs, and, with a separation assumption on the blocks, propose a private approximation algorithm which also recovers the blocks exactly. Finally, we perform an empirical study of our algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Random Matrices and Applications · Complexity and Algorithms in Graphs
