Differentially Private Federated $k$-Means Clustering with Server-Side Data
Jonathan Scott, Christoph H. Lampert, David Saulpic

TL;DR
This paper introduces FedDP-KMeans, a federated and differentially private $k$-means clustering algorithm that utilizes server-side data for initialization, achieving high accuracy and theoretical guarantees in privacy-preserving distributed clustering.
Contribution
The paper presents a novel federated differentially private $k$-means algorithm that leverages server-side data for initialization, improving clustering performance under privacy constraints.
Findings
Achieves excellent results on synthetic and real-world datasets.
Provides theoretical bounds on convergence speed and cluster identification.
Effectively uses server-side data to enhance differentially private clustering.
Abstract
Clustering is a cornerstone of data analysis that is particularly suited to identifying coherent subgroups or substructures in unlabeled data, as are generated continuously in large amounts these days. However, in many cases traditional clustering methods are not applicable, because data are increasingly being produced and stored in a distributed way, e.g. on edge devices, and privacy concerns prevent it from being transferred to a central server. To address this challenge, we present FedDP-KMeans, a new algorithm for -means clustering that is fully-federated as well as differentially private. Our approach leverages (potentially small and out-of-distribution) server-side data to overcome the primary challenge of differentially private clustering methods: the need for a good initialization. Combining our initialization with a simple federated DP-Lloyds algorithm we obtain an algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Advanced Clustering Algorithms Research
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
