Efficient Computation of Multiple Density-Based Clustering Hierarchies

Antonio Cavalcante Araujo Neto; Joerg Sander; Ricardo J. G. B.; Campello; Mario A. Nascimento

arXiv:1709.04545·cs.DB·June 11, 2018

Efficient Computation of Multiple Density-Based Clustering Hierarchies

Antonio Cavalcante Araujo Neto, Joerg Sander, Ricardo J. G. B., Campello, Mario A. Nascimento

PDF

TL;DR

This paper introduces an efficient method to compute multiple density-based clustering hierarchies across a range of parameters, significantly reducing computational costs compared to running the standard algorithm repeatedly.

Contribution

The authors propose a novel approach that replaces the graph in HDBSCAN* with a smaller one, enabling fast computation of multiple hierarchies for different parameter values.

Findings

01

Over 100 hierarchies can be computed with roughly twice the cost of a single run.

02

The method maintains the accuracy of the original HDBSCAN* hierarchies.

03

Experimental results demonstrate substantial efficiency improvements.

Abstract

HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the sense that a small change in mpts typically leads to only a small or no change in the clustering structure, choosing a "good" mpts value can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts values, however, one has to run HDBSCAN* for each value in the range independently, which is computationally inefficient. In this paper, we propose an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts values by replacing the graph used by HDBSCAN* with a much smaller graph that is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.