K-Medoids For K-Means Seeding
James Newling, Fran\c{c}ois Fleuret

TL;DR
This paper demonstrates that the clarans algorithm provides superior K-medoids solutions and serves as an effective initializer for K-means, outperforming other methods across multiple datasets with improved accuracy and efficiency.
Contribution
It introduces clarans as a better K-medoids solver and K-means initializer, with improved complexity and runtime for large datasets.
Findings
clarans outperforms Voronoi iteration in K-medoids solutions
clarans improves K-means initialization over K-means++ on all tested datasets
proposed runtime improvements make clarans suitable for large datasets
Abstract
We run experiments showing that algorithm clarans (Ng et al., 2005) finds better K-medoids solutions than the Voronoi iteration algorithm. This finding, along with the similarity between the Voronoi iteration algorithm and Lloyd's K-means algorithm, suggests that clarans may be an effective K-means initializer. We show that this is the case, with clarans outperforming other seeding algorithms on 23/23 datasets with a mean decrease over K-means++ of 30% for initialization mse and 3% or final mse. We describe how the complexity and runtime of clarans can be improved, making it a viable initialization scheme for large datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Algorithms and Data Compression · Data Management and Algorithms
