Nested partitions from hierarchical clustering statistical validation
Christian Bongiorno, Salvatore Miccich\`e, Rosario N. Mantegna

TL;DR
This paper introduces a fast, scalable greedy algorithm for detecting statistically validated nested partitions in hierarchical clustering dendrograms, outperforming existing methods in speed and scalability.
Contribution
We present a novel greedy algorithm that efficiently computes p-values for nested clusters, improving speed and scalability over existing methods like Pvclust.
Findings
Our algorithm is significantly faster than Pvclust.
It scales better with larger datasets and more records.
It provides reliable statistical validation of hierarchical clusters.
Abstract
We develop a greedy algorithm that is fast and scalable in the detection of a nested partition extracted from a dendrogram obtained from hierarchical clustering of a multivariate series. Our algorithm provides a -value for each clade observed in the hierarchical tree. The -value is obtained by computing a number of bootstrap replicas of the dissimilarity matrix and by performing a statistical test on each difference between the dissimilarity associated with a given clade and the dissimilarity of the clade of its parent node. We prove the efficacy of our algorithm with a set of benchmarks generated by using a hierarchical factor model. We compare the results obtained by our algorithm with those of Pvclust. Pvclust is a widely used algorithm developed with a global approach originally motivated by phylogenetic studies. In our numerical experiments we focus on the role of multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Time Series Analysis and Forecasting
