TL;DR
This paper introduces a parallel, index-based SCAN algorithm for graph clustering that significantly accelerates query times and index construction on large graphs, with provable efficiency and approximate clustering guarantees.
Contribution
It presents a practical parallel index construction method for SCAN using GS*-Index and LSH, improving speed and scalability over existing algorithms with theoretical guarantees.
Findings
Parallel index construction achieves 50-151x speedup over GS*-Index.
Parallel query processing is 5-32x faster than GS*-Index.
LSH-based approximation maintains clustering quality while speeding up index creation.
Abstract
SCAN (Structural Clustering Algorithm for Networks) is a well-studied, widely used graph clustering algorithm. For large graphs, however, sequential SCAN variants are prohibitively slow, and parallel SCAN variants do not effectively share work among queries with different SCAN parameter settings. Since users of SCAN often explore many parameter settings to find good clusterings, it is worthwhile to precompute an index that speeds up queries. This paper presents a practical and provably efficient parallel index-based SCAN algorithm based on GS*-Index, a recent sequential algorithm. Our parallel algorithm improves upon the asymptotic work of the sequential algorithm by using integer sorting. It is also highly parallel, achieving logarithmic span (parallel time) for both index construction and clustering queries. Furthermore, we apply locality-sensitive hashing (LSH) to design a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
