Evaluating and Validating Cluster Results

Anupriya Vysala; Joseph Gomes

arXiv:2007.08034·cs.LG·September 5, 2024·1 cites

Evaluating and Validating Cluster Results

Anupriya Vysala, Joseph Gomes

PDF

Open Access

TL;DR

This paper compares external and internal evaluation methods for clustering quality using the IRIS dataset, employing various metrics and visualization tools to validate clustering results.

Contribution

It provides a comprehensive comparison of evaluation techniques for clustering, combining quantitative metrics with visual analysis on a standard dataset.

Findings

01

External evaluation metrics show high homogeneity and correctness.

02

Internal measures like Silhouette Index support optimal cluster number.

03

Frequency distribution visualizations aid in interpreting clustering results.

Abstract

Clustering is the technique to partition data according to their characteristics. Data that are similar in nature belong to the same cluster [1]. There are two types of evaluation methods to evaluate clustering quality. One is an external evaluation where the truth labels in the data sets are known in advance and the other is internal evaluation in which the evaluation is done with data set itself without true labels. In this paper, both external evaluation and internal evaluation are performed on the cluster results of the IRIS dataset. In the case of external evaluation Homogeneity, Correctness and V-measure scores are calculated for the dataset. For internal performance measures, the Silhouette Index and Sum of Square Errors are used. These internal performance measures along with the dendrogram (graphical tool from hierarchical Clustering) are used first to validate the number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Mining Algorithms and Applications · Data Management and Algorithms