Efficient Parallel Algorithms for k-Center Clustering
Jessica McClintock, Anthony Wirth

TL;DR
This paper develops and compares parallel algorithms for the NP-hard k-center clustering problem, demonstrating that a parallelized greedy approach is highly efficient and effective for large datasets.
Contribution
It introduces a parallel approximation algorithm for k-center clustering based on Gonzalez's greedy method, achieving fast runtimes with minimal loss in solution quality.
Findings
Parallel Gonzalez algorithm is about 100 times faster than sequential.
Two MapReduce rounds suffice for a 4-approximation.
Trade-offs between runtime and approximation quality are demonstrated.
Abstract
The k-center problem is one of several classic NP-hard clustering questions. For contemporary massive data sets, RAM-based algorithms become impractical. And although there exist good sequential algorithms for k-center, they are not easily parallelizable. In this paper, we design and implement parallel approximation algorithms for this problem. We observe that Gonzalez's greedy algorithm can be efficiently parallelized in several MapReduce rounds; in practice, we find that two rounds are sufficient, leading to a 4-approximation. We contrast this with an existing parallel algorithm for k-center that runs in a constant number of rounds, and offers a 10-approximation. In depth runtime analysis reveals that this scheme is often slow, and that its sampling procedure only runs if k is sufficiently small, relative to the input size. To trade off runtime for approximation guarantee, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
