K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect   the Number of Clusters

Seyed Omid Mohammadi; Ahmad Kalhor; Hossein Bodaghi (University of; Tehran; College of Engineering; School of Electrical; Computer; Engineering; Tehran; Iran)

arXiv:2110.04660·cs.CV·May 25, 2022

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

Seyed Omid Mohammadi, Ahmad Kalhor, Hossein Bodaghi (University of, Tehran, College of Engineering, School of Electrical, Computer, Engineering, Tehran, Iran)

PDF

TL;DR

K-splits is an improved hierarchical clustering algorithm based on k-means that automatically determines the number of clusters, offering high accuracy and speed on synthetic and real-world datasets, and can enhance k-means initialization.

Contribution

The paper introduces k-splits, a novel hierarchical clustering method that automatically detects the optimal number of clusters, improving accuracy and efficiency over existing methods.

Findings

01

K-splits accurately finds the correct number of clusters across datasets.

02

K-splits is faster than comparable methods and sometimes faster than standard k-means.

03

Using k-splits to initialize k-means improves clustering results.

Abstract

This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.