Hierarchical clustering: visualization, feature importance and model selection
Luben M. C. Cabezas, Rafael Izbicki, Rafael B. Stern

TL;DR
This paper introduces new methods for analyzing hierarchical clustering that leverage the full dendrogram structure, providing better visualization, feature importance, and model selection tools than traditional single-cut approaches.
Contribution
It presents a novel framework that treats dendrograms as phylogenies, enabling comprehensive analysis using the entire hierarchical structure, unlike existing methods.
Findings
Proposed methods outperform traditional approaches in real and simulated datasets.
New visualization and feature importance tools provide deeper insights.
Framework is implemented in an accessible R package.
Abstract
We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram. Specifically, we propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram. Current approaches to these tasks lead to loss of information since they require the user to generate a single partition of the instances by cutting the dendrogram at a specified level. Our proposed methods, instead, use the full structure of the dendrogram. The key insight behind the proposed methods is to view a dendrogram as a phylogeny. This analogy permits the assignment of a feature value to each internal node of a tree through an evolutionary model. Real and simulated datasets provide evidence that our proposed framework has desirable outcomes and gives more insights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Data Mining Algorithms and Applications
