Parallel Index-Based Structural Graph Clustering and Its Approximation

Tom Tseng; Laxman Dhulipala; Julian Shun

arXiv:2012.11188·cs.DB·April 1, 2021

Parallel Index-Based Structural Graph Clustering and Its Approximation

Tom Tseng, Laxman Dhulipala, Julian Shun

PDF

1 Repo

TL;DR

This paper introduces a parallel, index-based SCAN algorithm for graph clustering that significantly accelerates query times and index construction on large graphs, with provable efficiency and approximate clustering guarantees.

Contribution

It presents a practical parallel index construction method for SCAN using GS*-Index and LSH, improving speed and scalability over existing algorithms with theoretical guarantees.

Findings

01

Parallel index construction achieves 50-151x speedup over GS*-Index.

02

Parallel query processing is 5-32x faster than GS*-Index.

03

LSH-based approximation maintains clustering quality while speeding up index creation.

Abstract

SCAN (Structural Clustering Algorithm for Networks) is a well-studied, widely used graph clustering algorithm. For large graphs, however, sequential SCAN variants are prohibitively slow, and parallel SCAN variants do not effectively share work among queries with different SCAN parameter settings. Since users of SCAN often explore many parameter settings to find good clusterings, it is worthwhile to precompute an index that speeds up queries. This paper presents a practical and provably efficient parallel index-based SCAN algorithm based on GS*-Index, a recent sequential algorithm. Our parallel algorithm improves upon the asymptotic work of the sequential algorithm by using integer sorting. It is also highly parallel, achieving logarithmic span (parallel time) for both index construction and clustering queries. Furthermore, we apply locality-sensitive hashing (LSH) to design a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ParAlg/gbbs/tree/master/benchmarks/SCAN/IndexBased
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.