Learned Accelerator Framework for Angular-Distance-Based   High-Dimensional DBSCAN

Yifan Wang; Daisy Zhe Wang

arXiv:2302.03136·cs.IR·February 8, 2023

Learned Accelerator Framework for Angular-Distance-Based High-Dimensional DBSCAN

Yifan Wang, Daisy Zhe Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LAF, a learned accelerator framework that significantly improves the efficiency and quality of high-dimensional DBSCAN clustering using angular distance, by predicting core points and correcting false negatives.

Contribution

The paper presents a novel learned framework with a cardinality estimator and post-processing to accelerate high-dimensional DBSCAN clustering, addressing performance issues in high-dimensional spaces.

Findings

01

LAF outperforms state-of-the-art DBSCAN variants in efficiency.

02

LAF improves clustering quality in high-dimensional data.

03

The framework effectively reduces unnecessary computations.

Abstract

Density-based clustering is a commonly used tool in data science. Today many data science works are utilizing high-dimensional neural embeddings. However, traditional density-based clustering techniques like DBSCAN have a degraded performance on high-dimensional data. In this paper, we propose LAF, a generic learned accelerator framework to speed up the original DBSCAN and the sampling-based variants of DBSCAN on high-dimensional data with angular distance metric. This framework consists of a learned cardinality estimator and a post-processing module. The cardinality estimator can fast predict whether a data point is core or not to skip unnecessary range queries, while the post-processing module detects the false negative predictions and merges the falsely separated clusters. The evaluation shows our LAF-enhanced DBSCAN method outperforms the state-of-the-art efficient DBSCAN variants…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wyfunique/laf-dbscan
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Neural Networks and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings