k-Means SubClustering: A Differentially Private Algorithm with Improved   Clustering Quality

Devvrat Joshi; Janvi Thakkar

arXiv:2301.02896·cs.LG·January 10, 2023

k-Means SubClustering: A Differentially Private Algorithm with Improved Clustering Quality

Devvrat Joshi, Janvi Thakkar

PDF

Open Access

TL;DR

This paper introduces a differentially private k-means clustering algorithm that improves clustering quality by sub-clustering and selecting more probable centroids, outperforming existing methods while preserving privacy.

Contribution

The novel approach of sub-clustering and centroid selection enhances clustering quality under differential privacy constraints, with proven improvements over prior methods.

Findings

01

Clustering quality improved by 4.13 times on Wine dataset.

02

Clustering quality improved by 2.83 times on Breast Cancer dataset.

03

Outperforms baseline in terms of clustering quality while maintaining privacy.

Abstract

In today's data-driven world, the sensitivity of information has been a significant concern. With this data and additional information on the person's background, one can easily infer an individual's private data. Many differentially private iterative algorithms have been proposed in interactive settings to protect an individual's privacy from these inference attacks. The existing approaches adapt the method to compute differentially private(DP) centroids by iterative Llyod's algorithm and perturbing the centroid with various DP mechanisms. These DP mechanisms do not guarantee convergence of differentially private iterative algorithms and degrade the quality of the cluster. Thus, in this work, we further extend the previous work on 'Differentially Private k-Means Clustering With Convergence Guarantee' by taking it as our baseline. The novelty of our approach is to sub-cluster the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Random Matrices and Applications · Statistical Methods and Inference

Methodsk-Means Clustering