PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search
Shangdi Yu, Joshua Engels, Yihao Huang, Julian Shun

TL;DR
PECANN introduces a scalable, parallel framework for density-based clustering of large high-dimensional datasets, leveraging graph-based approximate nearest neighbor search to significantly improve speed over existing methods.
Contribution
This work unifies density peaks clustering variants into a single framework and proposes an efficient, parallel predicate search technique using graph-based ANNS.
Findings
PECANN achieves up to 734x speedup over state-of-the-art sequential algorithms.
It is two orders of magnitude faster than existing parallel algorithms for low-dimensional data.
Successfully evaluated on large high-dimensional image and text datasets with up to 1.28 million points.
Abstract
This paper studies density-based clustering of point sets. These methods use dense regions of points to detect clusters of arbitrary shapes. In particular, we study variants of density peaks clustering, a popular type of algorithm that has been shown to work well in practice. Our goal is to cluster large high-dimensional datasets, which are prevalent in practice. Prior solutions are either sequential, and cannot scale to large data, or are specialized for low-dimensional data. This paper unifies the different variants of density peaks clustering into a single framework, PECANN, by abstracting out several key steps common to this class of algorithms. One such key step is to find nearest neighbors that satisfy a predicate function, and one of the main contributions of this paper is an efficient way to do this predicate search using graph-based approximate nearest neighbor search (ANNS).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Graph Theory and Algorithms
