Faster Parallel Exact Density Peaks Clustering
Yihao Huang, Shangdi Yu, Julian Shun

TL;DR
This paper introduces fast, parallel algorithms for exact Density Peaks Clustering, significantly improving scalability and speed over previous methods, enabling large-scale data analysis in various fields.
Contribution
It presents novel parallel algorithms for exact DPC with optimal work and low span, outperforming existing methods in speed and scalability.
Findings
Achieves $O( ext{log} n ext{log} ext{log} n)$ span with priority search kd-trees.
Realizes up to 13169x speedup over previous parallel algorithms.
Maintains work-efficiency matching the best sequential algorithms.
Abstract
Clustering multidimensional points is a fundamental data mining task, with applications in many fields, such as astronomy, neuroscience, bioinformatics, and computer vision. The goal of clustering algorithms is to group similar objects together. Density-based clustering is a clustering approach that defines clusters as dense regions of points. It has the advantage of being able to detect clusters of arbitrary shapes, rendering it useful in many applications. In this paper, we propose fast parallel algorithms for Density Peaks Clustering (DPC), a popular version of density-based clustering. Existing exact DPC algorithms suffer from low parallelism both in theory and in practice, which limits their application to large-scale data sets. Our most performant algorithm, which is based on priority search kd-trees, achieves span (parallel time complexity) for a data set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Face and Expression Recognition
