Fast Distributed k-Means with a Small Number of Rounds

Tom Hess; Ron Visbord; Sivan Sabato

arXiv:2201.13217·cs.DC·November 14, 2023

Fast Distributed k-Means with a Small Number of Rounds

Tom Hess, Ron Visbord, Sivan Sabato

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new distributed k-means clustering algorithm that reduces communication rounds to 1-4 in many cases, improves clustering quality over k-means||, and decreases running time by leveraging coordinator capacity.

Contribution

It presents a novel distributed k-means algorithm with fewer communication rounds, adaptive stopping, and better empirical performance compared to existing methods.

Findings

01

1-4 rounds suffice in many cases

02

Better clustering cost than k-means||

03

Reduced machine running time

Abstract

We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees a cost approximation factor and a number of communication rounds that depend only on the computational capacity of the coordinator. Moreover, the algorithm includes a built-in stopping mechanism, which allows it to use fewer communication rounds whenever possible. We show both theoretically and empirically that in many natural cases, indeed 1-4 rounds suffice. In comparison with the popular k-means|| algorithm, our approach allows exploiting a larger coordinator capacity to obtain a smaller number of rounds. Our experiments show that the k-means cost obtained by the proposed algorithm is usually better than the cost obtained by k-means||, even when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

selotape/distributed_k_means
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Stream Mining Techniques · Face and Expression Recognition

Methodsk-Means Clustering