Persistent Multiscale Density-based Clustering
Dani\"el Bot, Leland McInnes, Jan Aerts

TL;DR
This paper introduces PLSCAN, a new density-based clustering algorithm that identifies stable clusters across scales, reducing hyperparameter sensitivity and improving robustness in exploratory data analysis.
Contribution
The paper presents PLSCAN, a novel scale-space clustering method based on persistent homology, which efficiently finds stable clusters without extensive hyperparameter tuning.
Findings
PLSCAN outperforms HDBSCAN* in clustering accuracy (higher ARI).
PLSCAN is less sensitive to the number of neighbors.
It has competitive computational costs, especially in low dimensions.
Abstract
Clustering is a cornerstone of modern data analysis. Detecting clusters in exploratory data analyses (EDA) requires algorithms that make few assumptions about the data. Density-based clustering algorithms are particularly well-suited for EDA because they describe high-density regions, assuming only that a density exists. Applying density-based clustering algorithms in practice, however, requires selecting appropriate hyperparameters, which is difficult without prior knowledge of the data distribution. For example, DBSCAN requires selecting a density threshold, and HDBSCAN* relies on a minimum cluster size parameter. In this work, we propose Persistent Leaves Spatial Clustering for Applications with Noise (PLSCAN). This novel density-based clustering algorithm efficiently identifies all minimum cluster sizes for which HDBSCAN* produces stable (leaf) clusters. PLSCAN applies scale-space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Advanced Clustering Algorithms Research · Data Visualization and Analytics
