Improved Clustering with Augmented k-means
J. Andrew Howe

TL;DR
This paper introduces Augmented k-means, a hybrid clustering algorithm combining k-means and logistic regression, which improves clustering accuracy and convergence speed on complex datasets.
Contribution
The paper presents a novel hybrid clustering algorithm that integrates logistic regression into k-means to handle heterogeneity and overlap more effectively.
Findings
Outperforms standard k-means in accuracy
Converges faster on complex datasets
Effective on both simulated and real data
Abstract
Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, k-means is a very important algorithm for this. In this technical report, we develop a new k-means variant called Augmented k-means, which is a hybrid of k-means and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent re-estimation of cluster means. Observations which can't be firmly identified into clusters are excluded from the re-estimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter. Augmented k-means frequently outperforms k-means by more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Statistical Methods and Inference
MethodsLogistic Regression
