Using MM principles to deal with incomplete data in K-means clustering

Ali Beikmohammadi

arXiv:2212.12379·cs.LG·December 26, 2022·1 cites

Using MM principles to deal with incomplete data in K-means clustering

Ali Beikmohammadi

PDF

Open Access 1 Repo

TL;DR

This paper proposes a method based on Majorization-Minimization principles to effectively handle incomplete data in K-means clustering, restoring data symmetry to improve clustering performance.

Contribution

It introduces a novel MM-based approach for imputing missing data in K-means, enabling the algorithm to work effectively with incomplete datasets.

Findings

01

The proposed method improves clustering accuracy on standard datasets.

02

The algorithm effectively restores data symmetry with missing attributes.

03

Source code and pseudo-code are provided for reproducibility.

Abstract

Among many clustering algorithms, the K-means clustering algorithm is widely used because of its simple algorithm and fast convergence. However, this algorithm suffers from incomplete data, where some samples have missed some of their attributes. To solve this problem, we mainly apply MM principles to restore the symmetry of the data, so that K-means could work well. We give the pseudo-code of the algorithm and use the standard datasets for experimental verification. The source code for the experiments is publicly available in the following link: \url{https://github.com/AliBeikmohammadi/MM-Optimization/blob/main/mini-project/MM%20K-means.ipynb}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibeikmohammadi/mm-optimization
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Metaheuristic Optimization Algorithms Research · Face and Expression Recognition

Methodsk-Means Clustering