TL;DR
This paper introduces a hybrid clustering method combining HDBSCAN and DBSCAN* techniques, enhancing clustering performance on variable density data by applying an additional threshold without modifying the original hierarchy.
Contribution
It proposes a simple threshold-based hybrid approach that improves cluster selection in variable density scenarios, compatible with existing HDBSCAN implementations.
Findings
Effective in reducing micro-clusters in high-density regions
Applicable without modifying the original hierarchy
Enhances clustering flexibility for variable densities
Abstract
HDBSCAN is a density-based clustering algorithm that constructs a cluster hierarchy tree and then uses a specific stability measure to extract flat clusters from the tree. We show how the application of an additional threshold value can result in a combination of DBSCAN* and HDBSCAN clusters, and demonstrate potential benefits of this hybrid approach when clustering data of variable densities. In particular, our approach is useful in scenarios where we require a low minimum cluster size but want to avoid an abundance of micro-clusters in high-density regions. The method can directly be applied to HDBSCAN's tree of cluster candidates and does not require any modifications to the hierarchy itself. It can easily be integrated as an addition to existing HDBSCAN implementations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
