Determining Optimal Number of k-Clusters based on Predefined   Level-of-Similarity

Rabindra Lamsal; Shubham Katiyar

arXiv:1810.01878·cs.LG·October 8, 2020

Determining Optimal Number of k-Clusters based on Predefined Level-of-Similarity

Rabindra Lamsal, Shubham Katiyar

PDF

1 Repo

TL;DR

This paper introduces a centroid-based clustering algorithm that dynamically determines the number of clusters based on a predefined similarity threshold, suitable for streaming data analysis without pre-specifying cluster count.

Contribution

The paper presents a novel clustering method that eliminates the need to specify the number of clusters beforehand by using a similarity measure and a level-of-similarity threshold.

Findings

01

Effective for streaming data clustering

02

Automatically determines number of clusters

03

Operates based on predefined similarity threshold

Abstract

This paper proposes a centroid-based clustering algorithm which is capable of clustering data-points with n-features, without having to specify the number of clusters to be formed. The core logic behind the algorithm is a similarity measure, which collectively decides whether to assign an incoming data-point to a pre-existing cluster, or create a new cluster and assign the data-point to it. The proposed clustering algorithm is application-specific and is applicable when the need is to perform clustering analysis of a stream of data-points, where the similarity measure between an incoming data-point and the cluster to which the data-point is to be associated with, is greater than the predefined Level-of-Similarity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rabindralamsal/cs-means
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.