Scalable Community Detection via Parallel Correlation Clustering
Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub {\L}\k{a}cki,, Vahab Mirrokni

TL;DR
This paper presents a scalable parallel framework for community detection in large graphs, achieving high-quality clustering with significant speedups over existing methods on billion-edge datasets.
Contribution
The authors develop a generalized parallel framework based on LambdaCC that scales to billion-edge graphs, improving speed and quality trade-offs in community detection.
Findings
Achieves up to 28.44x speedup over sequential baselines.
Scales to graphs with billions of edges.
Maintains or improves clustering quality.
Abstract
Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LambdaCC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Caching and Content Delivery · Advanced Clustering Algorithms Research
