TL;DR
This paper introduces new algorithms for DBSCAN clustering that are faster and scalable in high-dimensional and streaming contexts, leveraging low intrinsic dimension assumptions.
Contribution
It presents a linear-time exact and approximate DBSCAN algorithm, including streaming implementation, under the assumption of low intrinsic dimension of inliers.
Findings
Significantly reduces computational complexity in high-dimensional spaces.
Effective in streaming data scenarios with memory independent of input size.
Experimental results show improved speed over existing DBSCAN methods.
Abstract
DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space ({\em e.g.,} clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimensional Euclidean space. In this paper, we study the DBSCAN problem under the assumption that the inliers (the core points and border points) have a low intrinsic dimension (which is a realistic assumption for many high-dimensional applications), where the outliers can locate anywhere in the space without any assumption. First, we propose a -center clustering based algorithm that can reduce the time-consuming labeling and merging tasks of DBSCAN to be linear. Further, we propose a linear time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
