PS-DBSCAN: An Efficient Parallel DBSCAN Algorithm Based on Platform Of AI (PAI)
Xu Hu, Jun Huang, Minghui Qiu, Cen Chen, Wei Chu

TL;DR
PS-DBSCAN is a parallel clustering algorithm optimized for distributed environments, significantly reducing communication costs and outperforming previous methods in speed, implemented within Alibaba Cloud's Platform of AI.
Contribution
The paper introduces PS-DBSCAN, a novel parallel DBSCAN algorithm that employs a global union approach to improve communication efficiency in distributed settings.
Findings
Achieves 2-10 times speedup over PDSDBSCAN
Reduces communication costs in distributed clustering
Successfully implemented in Alibaba Cloud's Platform of AI
Abstract
We present PS-DBSCAN, a communication efficient parallel DBSCAN algorithm that combines the disjoint-set data structure and Parameter Server framework in Platform of AI (PAI). Since data points within the same cluster may be distributed over different workers which result in several disjoint-sets, merging them incurs large communication costs. In our algorithm, we employ a fast global union approach to union the disjoint-sets to alleviate the communication burden. Experiments over the datasets of different scales demonstrate that PS-DBSCAN outperforms the PDSDBSCAN with 2-10 times speedup on communication efficiency. We have released our PS-DBSCAN in an algorithm platform called Platform of AI (PAI - https://pai.base.shuju.aliyun.com/) in Alibaba Cloud. We have also demonstrated how to use the method in PAI.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Cloud Computing and Resource Management · Advanced Neural Network Applications
