Identifying the number of clusters for K-Means: A hypersphere density   based approach

Sukavanan Nanjundan; Shreeviknesh Sankaran; C.R. Arjun; G. Paavai; Anand

arXiv:1912.00643·cs.LG·December 5, 2019·41 cites

Identifying the number of clusters for K-Means: A hypersphere density based approach

Sukavanan Nanjundan, Shreeviknesh Sankaran, C.R. Arjun, G. Paavai, Anand

PDF

Open Access

TL;DR

This paper introduces a simple, robust hypersphere density-based method to determine the optimal number of clusters in K-Means, eliminating the need for prior knowledge or complex calculations.

Contribution

The paper proposes a novel hypersphere density approach to identify the optimal number of clusters in K-Means without requiring parametric assumptions or ad hoc methods.

Findings

01

The method effectively determines the optimal cluster number using hypersphere density.

02

It provides reliable results across different datasets.

03

The approach is simple and easy to implement.

Abstract

Application of K-Means algorithm is restricted by the fact that the number of clusters should be known beforehand. Previously suggested methods to solve this problem are either ad hoc or require parametric assumptions and complicated calculations. The proposed method aims to solve this conundrum by considering cluster hypersphere density as the factor to determine the number of clusters in the given dataset. The density is calculated by assuming a hypersphere around the cluster centroid for n-different number of clusters. The calculated values are plotted against their corresponding number of clusters and then the optimum number of clusters is obtained after assaying the elbow region of the graph. The method is simple, easy to comprehend, and provides robust and reliable results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining and Machine Learning Applications · Leaf Properties and Growth Measurement