Parallel Correlation Clustering on Big Graphs
Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht,, Kannan Ramchandran, Michael I. Jordan

TL;DR
This paper introduces two parallel algorithms, C4 and ClusterWild!, for correlation clustering on large graphs, achieving near-linear speedups and high accuracy, significantly reducing runtime compared to existing methods.
Contribution
The paper presents novel parallel algorithms for correlation clustering that run in polylogarithmic rounds, with provable guarantees and superior scalability on large graphs.
Findings
C4 guarantees a 3-approximation ratio with serializability.
ClusterWild! offers a trade-off with a small loss in approximation for better scalability.
Both algorithms outperform state-of-the-art methods in accuracy and speed, clustering billion-edge graphs in under 5 seconds.
Abstract
Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, KwikCluster in practice requires a large number of clustering rounds, a potential bottleneck for large graphs. We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably. C4 uses concurrency control to enforce serializability of a parallel clustering process, and guarantees a 3-approximation ratio. ClusterWild! is a coordination free algorithm that abandons consistency for the benefit of better scaling; this leads to a provably small loss in the 3-approximation ratio. We provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Advanced Graph Neural Networks
