Efficient Parallel Algorithms for k-Center Clustering

Jessica McClintock; Anthony Wirth

arXiv:1604.03228·cs.DC·April 13, 2016

Efficient Parallel Algorithms for k-Center Clustering

Jessica McClintock, Anthony Wirth

PDF

TL;DR

This paper develops and compares parallel algorithms for the NP-hard k-center clustering problem, demonstrating that a parallelized greedy approach is highly efficient and effective for large datasets.

Contribution

It introduces a parallel approximation algorithm for k-center clustering based on Gonzalez's greedy method, achieving fast runtimes with minimal loss in solution quality.

Findings

01

Parallel Gonzalez algorithm is about 100 times faster than sequential.

02

Two MapReduce rounds suffice for a 4-approximation.

03

Trade-offs between runtime and approximation quality are demonstrated.

Abstract

The k-center problem is one of several classic NP-hard clustering questions. For contemporary massive data sets, RAM-based algorithms become impractical. And although there exist good sequential algorithms for k-center, they are not easily parallelizable. In this paper, we design and implement parallel approximation algorithms for this problem. We observe that Gonzalez's greedy algorithm can be efficiently parallelized in several MapReduce rounds; in practice, we find that two rounds are sufficient, leading to a 4-approximation. We contrast this with an existing parallel algorithm for k-center that runs in a constant number of rounds, and offers a 10-approximation. In depth runtime analysis reveals that this scheme is often slow, and that its sampling procedure only runs if k is sufficiently small, relative to the input size. To trade off runtime for approximation guarantee, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.