K-Means Clustering With Incomplete Data with the Use of Mahalanobis Distances
Lovis Kwasi Armah, Igor Melnykov

TL;DR
This paper introduces a unified K-means clustering algorithm that uses Mahalanobis distances to better handle incomplete data with elliptical clusters, outperforming existing methods in experiments.
Contribution
The work extends previous unified clustering-imputation approaches by incorporating Mahalanobis distances, improving clustering accuracy for elliptical data shapes.
Findings
Our algorithm outperforms standalone imputation and clustering methods.
It achieves higher ARI and NMI scores on synthetic and real datasets.
Results demonstrate robustness across different data distributions.
Abstract
Effectively applying the K-means algorithm to clustering tasks with incomplete features remains an important research area due to its impact on real-world applications. Recent work has shown that unifying K-means clustering and imputation into one single objective function and solving the resultant optimization yield superior results compared to handling imputation and clustering separately. In this work, we extend this approach by developing a unified K-means algorithm that incorporates Mahalanobis distances, instead of the traditional Euclidean distances, which previous research has shown to perform better for clusters with elliptical shapes. We conducted extensive experiments on synthetic datasets containing up to ten elliptical clusters, as well as the IRIS dataset. Using the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), we demonstrate that our algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Advanced Clustering Algorithms Research
Methodsk-Means Clustering
