Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces

Yongyu Wang

arXiv:2411.11421·cs.CV·December 4, 2024

Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces

Yongyu Wang

PDF

Open Access

TL;DR

This paper introduces a spectral data compression-based adaptation of DBSCAN, significantly improving its efficiency and accuracy for clustering large-scale, high-dimensional datasets by reducing redundancy and noise.

Contribution

We propose a novel spectral data compression technique for DBSCAN that enhances its scalability and effectiveness in high-dimensional, large-scale data analysis.

Findings

01

Enhanced clustering accuracy on large datasets

02

Reduced computational complexity of DBSCAN

03

Effective noise and redundancy removal

Abstract

DBSCAN is one of the most important non-parametric unsupervised data analysis tools. By applying DBSCAN to a dataset, two key analytical results can be obtained: (1) clustering data points based on density distribution and (2) identifying outliers in the dataset. However, the time complexity of the DBSCAN algorithm is $O (n^{2} β)$ , where $n$ is the number of data points and $β = O (D)$ , with $D$ representing the dimensionality of the data space. As a result, DBSCAN becomes computationally infeasible when both $n$ and $D$ are large. In this paper, we propose a DBSCAN method based on spectral data compression, capable of efficiently processing datasets with a large number of data points ( $n$ ) and high dimensionality ( $D$ ). By preserving only the most critical structural information during the compression process, our method effectively removes substantial redundancy and noise.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Medical Imaging Techniques and Applications

MethodsSparse Evolutionary Training