An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data
Thapana Boonchoo, Xiang Ao, Qing He

TL;DR
This paper introduces GDPAM, a novel density-based clustering algorithm that extends grid-based DBSCAN to higher-dimensional data, utilizing bitmap indexing and union-find to improve efficiency and scalability.
Contribution
GDPAM extends grid-based DBSCAN to high-dimensional data by employing bitmap indexing and union-find, reducing redundancies and improving efficiency.
Findings
Outperforms state-of-the-art DBSCAN variants in high-dimensional datasets
Demonstrates good scalability on real-world and synthetic data
Reduces neighbor explosion and merging redundancies
Abstract
DBSCAN is a typically used clustering algorithm due to its clustering ability for arbitrarily-shaped clusters and its robustness to outliers. Generally, the complexity of DBSCAN is O(n^2) in the worst case, and it practically becomes more severe in higher dimension. Grid-based DBSCAN is one of the recent improved algorithms aiming at facilitating efficiency. However, the performance of grid-based DBSCAN still suffers from two problems: neighbour explosion and redundancies in merging, which make the algorithms infeasible in high-dimensional space. In this paper, we propose a novel algorithm named GDPAM attempting to extend Grid-based DBSCAN to higher data dimension. In GDPAM, a bitmap indexing is utilized to manage non-empty grids so that the neighbour grid queries can be performed efficiently. Furthermore, we adopt an efficient union-find algorithm to maintain the clustering information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Advanced Image and Video Retrieval Techniques
