Multi-rank Sparse Hierarchical Clustering
Hongyang Zhang, Ruben H. Zamar

TL;DR
This paper introduces Multi-rank Sparse Hierarchical Clustering (MrSHC), a new method designed to improve clustering accuracy and feature selection in large, flat datasets with many noise features, outperforming existing methods.
Contribution
The paper proposes MrSHC, a novel hierarchical clustering framework that effectively handles complex clustering structures and noise features in high-dimensional data.
Findings
MrSHC outperforms classical hierarchical clustering in feature selection.
MrSHC achieves better clustering accuracy in simulations and real data.
MrSHC effectively identifies relevant features in noisy, high-dimensional datasets.
Abstract
There has been a surge in the number of large and flat data sets - data sets containing a large number of features and a relatively small number of observations - due to the growing ability to collect and store information in medical research and other fields. Hierarchical clustering is a widely used clustering tool. In hierarchical clustering, large and flat data sets may allow for a better coverage of clustering features (features that help explain the true underlying clusters) but, such data sets usually include a large fraction of noise features (non-clustering features) that may hide the underlying clusters. Witten and Tibshirani (2010) proposed a sparse hierarchical clustering framework to cluster the observations using an adaptively chosen subset of the features, however, we show that this framework has some limitations when the data sets contain clustering features with complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Gene expression and cancer classification · Bayesian Methods and Mixture Models
