A Data-Driven Approach to Estimating the Number of Clusters in Hierarchical Clustering
Antoine Zambelli

TL;DR
This paper introduces two new data-driven, fully automated methods for estimating the number of clusters in hierarchical clustering, demonstrating superior performance over traditional methods on simulated and real gene expression data.
Contribution
The paper presents novel, easy-to-implement, computationally efficient methods that require no researcher input for estimating cluster numbers in hierarchical clustering.
Findings
Outperform Gap statistic and Elbow methods in multi-cluster scenarios
Effective on simulated datasets and gene expression data
Fully automated with no human intervention
Abstract
We propose two new methods for estimating the number of clusters in a hierarchical clustering framework in the hopes of creating a fully automated process with no human intervention. The methods are completely data-driven and require no input from the researcher, and as such are fully automated. They are quite easy to implement and not computationally intensive in the least. We analyze performance on several simulated data sets and the Biobase Gene Expression Set, comparing our methods to the established Gap statistic and Elbow methods and outperforming both in multi-cluster scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
