DBSCAN++: Towards fast and scalable density clustering

Jennifer Jang; Heinrich Jiang

arXiv:1810.13105·cs.LG·May 21, 2019·43 cites

DBSCAN++: Towards fast and scalable density clustering

Jennifer Jang, Heinrich Jiang

PDF

Open Access

TL;DR

DBSCAN++ is a modified density-based clustering algorithm that reduces computational complexity by sampling density calculations, achieving faster runtime and robustness while maintaining statistical guarantees and optimal estimation rates.

Contribution

It introduces a sampling-based modification to DBSCAN that is faster, robust, and retains statistical consistency and optimal estimation rates.

Findings

01

DBSCAN++ is sub-quadratic in runtime.

02

It maintains minimax optimal rates for level-set estimation.

03

Empirical results show competitive performance with traditional DBSCAN.

Abstract

DBSCAN is a classical density-based clustering procedure with tremendous practical relevance. However, DBSCAN implicitly needs to compute the empirical density for each sample point, leading to a quadratic worst-case time complexity, which is too slow on large datasets. We propose DBSCAN++, a simple modification of DBSCAN which only requires computing the densities for a chosen subset of points. We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We also present statistical consistency guarantees showing the trade-off between computational cost and estimation rates. Surprisingly, up to a certain point, we can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Data Management and Algorithms