A novel k-means clustering approach using two distance measures for Gaussian data
Naitik Gada (1) ((1) Rochester Institute of Technology)

TL;DR
This paper introduces a new k-means clustering algorithm that combines within and inter cluster distances to improve clustering accuracy and robustness, especially for outliers, demonstrated on synthetic and benchmark datasets.
Contribution
The paper proposes a novel k-means approach using both WCD and ICD metrics, enhancing clustering robustness and outlier detection over traditional methods.
Findings
More accurate cluster convergence with combined distance measures
Improved outlier clustering accuracy
Robustness demonstrated on synthetic and UCI benchmark datasets
Abstract
Clustering algorithms have long been the topic of research, representing the more popular side of unsupervised learning. Since clustering analysis is one of the best ways to find some clarity and structure within raw data, this paper explores a novel approach to \textit{k}-means clustering. Here we present a \textit{k}-means clustering algorithm that takes both the within cluster distance (WCD) and the inter cluster distance (ICD) as the distance metric to cluster the data into \emph{k} clusters pre-determined by the Calinski-Harabasz criterion in order to provide a more robust output for the clustering analysis. The idea with this approach is that by including both the measurement metrics, the convergence of the data into their clusters becomes solidified and more robust. We run the algorithm with some synthetically produced data and also some benchmark data sets obtained from the UCI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Anomaly Detection Techniques and Applications
