Distributed k-Means and k-Median Clustering on General Topologies
Maria Florina Balcan, Steven Ehrlich, Yingyu Liang

TL;DR
This paper introduces new distributed algorithms for k-means and k-median clustering that reduce communication costs and work on general network topologies, with proven guarantees and superior experimental performance.
Contribution
It presents a distributed coreset construction method for clustering that improves communication efficiency and applies to arbitrary topologies, advancing distributed clustering techniques.
Findings
Reduces communication complexity compared to previous methods.
Works effectively over general communication topologies.
Outperforms existing coreset-based distributed clustering algorithms in experiments.
Abstract
This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by \cite{har2004coresets}, we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies. Experimental results on large scale data sets show that this approach outperforms other coreset-based distributed clustering algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Data Management and Algorithms
