DPM: Clustering Sensitive Data through Separation
Johannes Liebenow, Yara Sch\"utt, Tanya Braun, Marcel Gehrke, Florian, Thaeter, Esfandiar Mohammadi

TL;DR
DPM is a privacy-preserving clustering algorithm that uses geometric separation and hyper-parameter estimation to achieve high utility and results close to non-private KMeans, while ensuring differential privacy.
Contribution
The paper introduces DPM, a novel privacy-preserving clustering method that recursively separates data and estimates hyper-parameters privately, improving utility and closeness to non-private clustering.
Findings
DPM achieves state-of-the-art utility on standard clustering metrics.
DPM produces clustering results closer to non-private KMeans.
DPM preserves differential privacy while maintaining high clustering quality.
Abstract
Clustering is an important tool for data exploration where the goal is to subdivide a data set into disjoint clusters that fit well into the underlying data structure. When dealing with sensitive data, privacy-preserving algorithms aim to approximate the non-private baseline while minimising the leakage of sensitive information. State-of-the-art privacy-preserving clustering algorithms tend to output clusters that are good in terms of the standard metrics, inertia, silhouette score, and clustering accuracy, however, the clustering result strongly deviates from the non-private KMeans baseline. In this work, we present a privacy-preserving clustering algorithm called DPM that recursively separates a data set into clusters based on a geometrical clustering approach. In addition, DPM estimates most of the data-dependent hyper-parameters in a privacy-preserving way. We prove that DPM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Biometric Identification and Security · Face recognition and analysis
MethodsFocus
