Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering
M. Emre Celebi, Hassan A. Kingravi

TL;DR
This paper introduces a discriminant analysis-based enhancement to hierarchical initialization methods for K-means clustering, improving their performance and maintaining their linear, deterministic, and order-invariant properties.
Contribution
It proposes a novel discriminant analysis approach that enhances hierarchical initialization methods, making them more effective while preserving their computational advantages.
Findings
Hierarchical methods are competitive with k-means++ for initialization.
The proposed approach significantly improves hierarchical methods' performance.
Experiments on diverse datasets validate the effectiveness of the new method.
Abstract
K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. Many of these methods, however, have superlinear complexity in the number of data points, making them impractical for large data sets. On the other hand, linear methods are often random and/or order-sensitive, which renders their results unrepeatable. Recently, Su and Dy proposed two highly successful hierarchical initialization methods named Var-Part and PCA-Part that are not only linear, but also deterministic (non-random) and order-invariant. In this paper, we propose a discriminant analysis based approach that addresses a common deficiency of these two methods. Experiments on a large and diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
