Efficient Computation of Multiple Density-Based Clustering Hierarchies
Antonio Cavalcante Araujo Neto, Joerg Sander, Ricardo J. G. B., Campello, Mario A. Nascimento

TL;DR
This paper introduces an efficient method to compute multiple density-based clustering hierarchies across a range of parameters, significantly reducing computational costs compared to running the standard algorithm repeatedly.
Contribution
The authors propose a novel approach that replaces the graph in HDBSCAN* with a smaller one, enabling fast computation of multiple hierarchies for different parameter values.
Findings
Over 100 hierarchies can be computed with roughly twice the cost of a single run.
The method maintains the accuracy of the original HDBSCAN* hierarchies.
Experimental results demonstrate substantial efficiency improvements.
Abstract
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the sense that a small change in mpts typically leads to only a small or no change in the clustering structure, choosing a "good" mpts value can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts values, however, one has to run HDBSCAN* for each value in the range independently, which is computationally inefficient. In this paper, we propose an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts values by replacing the graph used by HDBSCAN* with a much smaller graph that is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
