Massively Parallel Correlation Clustering in Bounded Arboricity Graphs
M\'elanie Cambus, Davin Choo, Havu Miikonen, Jara Uitto

TL;DR
This paper presents a fast parallel algorithm for correlation clustering in graphs with bounded arboricity, achieving a 3-approximation in logarithmic MPC rounds, improving efficiency for large-scale data clustering.
Contribution
It introduces a novel 3-approximation algorithm for correlation clustering in bounded arboricity graphs within the MPC model, combining structural graph properties with randomized greedy techniques.
Findings
Runs in $ ext{O}( ext{log} \lambda imes ext{poly}( ext{log} ext{log} n))$ MPC rounds
Provides exact and $(1+ ext{epsilon})$-approximate algorithms for forests
Achieves efficient parallel clustering for large-scale data in bounded arboricity graphs
Abstract
Identifying clusters of similar elements in a set is a common task in data analysis. With the immense growth of data and physical limitations on single processor speed, it is necessary to find efficient parallel algorithms for clustering tasks. In this paper, we study the problem of correlation clustering in bounded arboricity graphs with respect to the Massively Parallel Computation (MPC) model. More specifically, we are given a complete graph where the edges are either positive or negative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the vertices into clusters with as few disagreements as possible. That is, we want to minimize the number of positive inter-cluster edges and negative intra-cluster edges. Consider an input graph on vertices such that the positive edges induce a -arboric graph. Our main result is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
