Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm

Vardhan Shorewala; Shivam Shorewala

arXiv:2505.24365·cs.LG·June 2, 2025

Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm

Vardhan Shorewala, Shivam Shorewala

PDF

TL;DR

This paper presents an enhanced K-Means algorithm that refines clustering by reducing intra-cluster variance and detects anomalies by identifying points that increase variance, validated on synthetic and real datasets.

Contribution

The paper introduces a novel iterative K-Means variant that improves cluster compactness and integrates anomaly detection based on variance impact, with extensive validation.

Findings

01

Variance reduction of 18.7% on synthetic data

02

88.1% variance reduction on Wine Quality dataset

03

F1 score improvement of 20.8% on Wine Quality dataset

Abstract

This paper introduces a unified approach to cluster refinement and anomaly detection in datasets. We propose a novel algorithm that iteratively reduces the intra-cluster variance of N clusters until a global minimum is reached, yielding tighter clusters than the standard k-means algorithm. We evaluate the method using intrinsic measures for unsupervised learning, including the silhouette coefficient, Calinski-Harabasz index, and Davies-Bouldin index, and extend it to anomaly detection by identifying points whose assignment causes a significant variance increase. External validation on synthetic data and the UCI Breast Cancer and UCI Wine Quality datasets employs the Jaccard similarity score, V-measure, and F1 score. Results show variance reductions of 18.7% and 88.1% on the synthetic and Wine Quality datasets, respectively, along with accuracy and F1 score improvements of 22.5% and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.