Improved Clustering with Augmented k-means

J. Andrew Howe

arXiv:1705.07592·stat.ML·May 23, 2017·2 cites

Improved Clustering with Augmented k-means

J. Andrew Howe

PDF

Open Access

TL;DR

This paper introduces Augmented k-means, a hybrid clustering algorithm combining k-means and logistic regression, which improves clustering accuracy and convergence speed on complex datasets.

Contribution

The paper presents a novel hybrid clustering algorithm that integrates logistic regression into k-means to handle heterogeneity and overlap more effectively.

Findings

01

Outperforms standard k-means in accuracy

02

Converges faster on complex datasets

03

Effective on both simulated and real data

Abstract

Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, k-means is a very important algorithm for this. In this technical report, we develop a new k-means variant called Augmented k-means, which is a hybrid of k-means and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent re-estimation of cluster means. Observations which can't be firmly identified into clusters are excluded from the re-estimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter. Augmented k-means frequently outperforms k-means by more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Statistical Methods and Inference

MethodsLogistic Regression