A Comparative Study of Efficient Initialization Methods for the K-Means   Clustering Algorithm

M. Emre Celebi; Hassan A. Kingravi; Patricio A. Vela

arXiv:1209.1960·cs.LG·September 11, 2012

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

M. Emre Celebi, Hassan A. Kingravi, Patricio A. Vela

PDF

1 Repo

TL;DR

This paper compares eight linear time complexity initialization methods for K-means clustering, evaluating their efficiency and effectiveness across diverse datasets, and offers practical recommendations based on statistical analysis.

Contribution

It provides a comprehensive comparison of popular K-means initialization methods, highlighting their limitations and proposing better alternatives based on extensive experiments.

Findings

01

Many popular initialization methods perform poorly.

02

Some strong alternative methods outperform traditional approaches.

03

Recommendations for practitioners are provided based on statistical analysis.

Abstract

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tugrulhkarabulut/K-Means-Clustering
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.