Learned Accelerator Framework for Angular-Distance-Based High-Dimensional DBSCAN
Yifan Wang, Daisy Zhe Wang

TL;DR
This paper introduces LAF, a learned accelerator framework that significantly improves the efficiency and quality of high-dimensional DBSCAN clustering using angular distance, by predicting core points and correcting false negatives.
Contribution
The paper presents a novel learned framework with a cardinality estimator and post-processing to accelerate high-dimensional DBSCAN clustering, addressing performance issues in high-dimensional spaces.
Findings
LAF outperforms state-of-the-art DBSCAN variants in efficiency.
LAF improves clustering quality in high-dimensional data.
The framework effectively reduces unnecessary computations.
Abstract
Density-based clustering is a commonly used tool in data science. Today many data science works are utilizing high-dimensional neural embeddings. However, traditional density-based clustering techniques like DBSCAN have a degraded performance on high-dimensional data. In this paper, we propose LAF, a generic learned accelerator framework to speed up the original DBSCAN and the sampling-based variants of DBSCAN on high-dimensional data with angular distance metric. This framework consists of a learned cardinality estimator and a post-processing module. The cardinality estimator can fast predict whether a data point is core or not to skip unnecessary range queries, while the post-processing module detects the false negative predictions and merges the falsely separated clusters. The evaluation shows our LAF-enhanced DBSCAN method outperforms the state-of-the-art efficient DBSCAN variants…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
