Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
M. Emre Celebi, Hassan A. Kingravi

TL;DR
This paper evaluates six linear, deterministic, and order-invariant initialization methods for k-means clustering, finding that two hierarchical methods outperform others in effectiveness, while a recent method performs poorly.
Contribution
It provides an empirical comparison of six initialization methods, highlighting the superior performance of two hierarchical approaches for large, diverse datasets.
Findings
Two hierarchical methods outperform others in effectiveness.
Erisoglu et al.'s recent method performs poorly.
Deterministic, order-invariant methods can be reliable alternatives.
Abstract
Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Multi-Criteria Decision Making
