A Unified Framework for Center-based Clustering of Distributed Data
Aleksandar Armacki, Dragana Bajovi\'c, Du\v{s}an Jakoveti\'c, Soummya, Kar

TL;DR
This paper introduces a unified distributed clustering framework that enables users with local data to collaboratively find a global clustering solution over a network, applicable to various loss functions including K-means.
Contribution
The paper proposes a novel family of distributed clustering algorithms, DGC-𝓕_ρ, with a unified analysis, applicable to multiple loss functions, and guarantees convergence to meaningful fixed points.
Findings
Centers converge to fixed points under mild conditions.
As ρ increases, fixed points approach consensus.
Algorithms perform well on synthetic and real data.
Abstract
We develop a family of distributed center-based clustering algorithms that work over networks of users. In the proposed scenario, users contain a local dataset and communicate only with their immediate neighbours, with the aim of finding a clustering of the full, joint data. The proposed family, termed Distributed Gradient Clustering (DGC-), is parametrized by , controling the proximity of users' center estimates, with determining the clustering loss. Our framework allows for a broad class of smooth convex loss functions, including popular clustering losses like -means and Huber loss. Specialized to popular clustering losses like -means and Huber loss, DGC- gives rise to novel distributed clustering algorithms DGC-KM and DGC-HL, while novel clustering losses based on Logistic and Fair functions lead to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms
MethodsSparse Evolutionary Training · Huber loss
