Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms

Guanlin Mo; Shihong Song; Hu Ding

arXiv:2405.06899·cs.DS·January 7, 2025

Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms

Guanlin Mo, Shihong Song, Hu Ding

PDF

1 Repo

TL;DR

This paper introduces new algorithms for DBSCAN clustering that are faster and scalable in high-dimensional and streaming contexts, leveraging low intrinsic dimension assumptions.

Contribution

It presents a linear-time exact and approximate DBSCAN algorithm, including streaming implementation, under the assumption of low intrinsic dimension of inliers.

Findings

01

Significantly reduces computational complexity in high-dimensional spaces.

02

Effective in streaming data scenarios with memory independent of input size.

03

Experimental results show improved speed over existing DBSCAN methods.

Abstract

DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space ({\em e.g.,} clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimensional Euclidean space. In this paper, we study the DBSCAN problem under the assumption that the inliers (the core points and border points) have a low intrinsic dimension (which is a realistic assumption for many high-dimensional applications), where the outliers can locate anywhere in the space without any assumption. First, we propose a $k$ -center clustering based algorithm that can reduce the time-consuming labeling and merging tasks of DBSCAN to be linear. Further, we propose a linear time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

moguanlin/towards-metric-dbscan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.