K-Means Clustering With Incomplete Data with the Use of Mahalanobis   Distances

Lovis Kwasi Armah; Igor Melnykov

arXiv:2411.00870·cs.LG·April 14, 2025

K-Means Clustering With Incomplete Data with the Use of Mahalanobis Distances

Lovis Kwasi Armah, Igor Melnykov

PDF

Open Access

TL;DR

This paper introduces a unified K-means clustering algorithm that uses Mahalanobis distances to better handle incomplete data with elliptical clusters, outperforming existing methods in experiments.

Contribution

The work extends previous unified clustering-imputation approaches by incorporating Mahalanobis distances, improving clustering accuracy for elliptical data shapes.

Findings

01

Our algorithm outperforms standalone imputation and clustering methods.

02

It achieves higher ARI and NMI scores on synthetic and real datasets.

03

Results demonstrate robustness across different data distributions.

Abstract

Effectively applying the K-means algorithm to clustering tasks with incomplete features remains an important research area due to its impact on real-world applications. Recent work has shown that unifying K-means clustering and imputation into one single objective function and solving the resultant optimization yield superior results compared to handling imputation and clustering separately. In this work, we extend this approach by developing a unified K-means algorithm that incorporates Mahalanobis distances, instead of the traditional Euclidean distances, which previous research has shown to perform better for clusters with elliptical shapes. We conducted extensive experiments on synthetic datasets containing up to ten elliptical clusters, as well as the IRIS dataset. Using the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), we demonstrate that our algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Advanced Clustering Algorithms Research

Methodsk-Means Clustering