Experimental Estimation of Number of Clusters Based on Cluster Quality

G. Hannah Grace; Kalyani Desikan

arXiv:1503.03168·cs.IR·March 12, 2015

Experimental Estimation of Number of Clusters Based on Cluster Quality

G. Hannah Grace, Kalyani Desikan

PDF

TL;DR

This paper investigates an experimental method to estimate the optimal number of clusters in text clustering by evaluating cluster quality, addressing the common requirement of predefining cluster count in algorithms.

Contribution

It introduces an experimental approach to determine the number of clusters based on cluster quality metrics, specifically for partitional clustering algorithms in text mining.

Findings

01

Cluster quality metrics can effectively estimate the optimal number of clusters.

02

The approach improves clustering organization without prior knowledge of cluster count.

03

Experimental results validate the method's applicability to large text datasets.

Abstract

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to determine the number of clusters based on cluster quality. Since partitional clustering algorithms are well-suited for clustering large document datasets, we have confined our analysis to a partitional clustering algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.