Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS   Algorithms

Erich Schubert; Peter J. Rousseeuw

arXiv:1810.05691·cs.LG·July 8, 2024

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

Erich Schubert, Peter J. Rousseeuw

PDF

4 Repos

TL;DR

This paper introduces modifications to the PAM, CLARA, and CLARANS clustering algorithms that significantly reduce their runtime, enabling their application to larger datasets and higher cluster counts without sacrificing result quality.

Contribution

The authors propose a new, faster SWAP phase for PAM and its variants, achieving up to a 200-fold speedup while maintaining clustering accuracy.

Findings

01

200-fold speedup in PAM on real data with k=100

02

Enables PAM to handle larger datasets and higher k values

03

Maintains clustering quality despite faster computation

Abstract

Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In Euclidean geometry the mean-as used in k-means-is a good estimator for the cluster center, but this does not hold for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-)similarity, and thus is of high relevance to many domains such as biology that require the use of Jaccard, Gower, or more complex distances. A key issue with PAM is its high run time cost. We propose modifications to the PAM algorithm to achieve an O(k)-fold speedup in the second SWAP phase of the algorithm, but will still find the same results as the original PAM algorithm. If we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.