Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm
Vardhan Shorewala, Shivam Shorewala

TL;DR
This paper presents an enhanced K-Means algorithm that refines clustering by reducing intra-cluster variance and detects anomalies by identifying points that increase variance, validated on synthetic and real datasets.
Contribution
The paper introduces a novel iterative K-Means variant that improves cluster compactness and integrates anomaly detection based on variance impact, with extensive validation.
Findings
Variance reduction of 18.7% on synthetic data
88.1% variance reduction on Wine Quality dataset
F1 score improvement of 20.8% on Wine Quality dataset
Abstract
This paper introduces a unified approach to cluster refinement and anomaly detection in datasets. We propose a novel algorithm that iteratively reduces the intra-cluster variance of N clusters until a global minimum is reached, yielding tighter clusters than the standard k-means algorithm. We evaluate the method using intrinsic measures for unsupervised learning, including the silhouette coefficient, Calinski-Harabasz index, and Davies-Bouldin index, and extend it to anomaly detection by identifying points whose assignment causes a significant variance increase. External validation on synthetic data and the UCI Breast Cancer and UCI Wine Quality datasets employs the Jaccard similarity score, V-measure, and F1 score. Results show variance reductions of 18.7% and 88.1% on the synthetic and Wine Quality datasets, respectively, along with accuracy and F1 score improvements of 22.5% and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
