Evaluating and Validating Cluster Results
Anupriya Vysala, Joseph Gomes

TL;DR
This paper compares external and internal evaluation methods for clustering quality using the IRIS dataset, employing various metrics and visualization tools to validate clustering results.
Contribution
It provides a comprehensive comparison of evaluation techniques for clustering, combining quantitative metrics with visual analysis on a standard dataset.
Findings
External evaluation metrics show high homogeneity and correctness.
Internal measures like Silhouette Index support optimal cluster number.
Frequency distribution visualizations aid in interpreting clustering results.
Abstract
Clustering is the technique to partition data according to their characteristics. Data that are similar in nature belong to the same cluster [1]. There are two types of evaluation methods to evaluate clustering quality. One is an external evaluation where the truth labels in the data sets are known in advance and the other is internal evaluation in which the evaluation is done with data set itself without true labels. In this paper, both external evaluation and internal evaluation are performed on the cluster results of the IRIS dataset. In the case of external evaluation Homogeneity, Correctness and V-measure scores are calculated for the dataset. For internal performance measures, the Silhouette Index and Sum of Square Errors are used. These internal performance measures along with the dendrogram (graphical tool from hierarchical Clustering) are used first to validate the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Mining Algorithms and Applications · Data Management and Algorithms
