Distributed Graph Clustering using Modularity and Map Equation
Michael Hamann, Ben Strasser, Dorothea Wagner, Tim Zeitz

TL;DR
This paper introduces two distributed algorithms, DSLM-Mod and DSLM-Map, for large-scale graph clustering based on modularity and map equation, demonstrating superior speed and quality on extensive real-world and synthetic graphs.
Contribution
The paper presents novel distributed algorithms for graph clustering optimizing modularity and map equation, scalable to billion-edge graphs, with improved speed and memory efficiency.
Findings
Algorithms are fast and produce high-quality clusters.
Compared to GossipMap, our methods use less memory and are up to ten times faster.
Effective on graphs with up to 68 billion edges.
Abstract
We study large-scale, distributed graph clustering. Given an undirected graph, our objective is to partition the nodes into disjoint sets called clusters. A cluster should contain many internal edges while being sparsely connected to other clusters. In the context of a social network, a cluster could be a group of friends. Modularity and map equation are established formalizations of this internally-dense-externally-sparse principle. We present two versions of a simple distributed algorithm to optimize both measures. They are based on Thrill, a distributed big data processing framework that implements an extended MapReduce model. The algorithms for the two measures, DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality measures is straight-forward. We conduct an extensive experimental study on real-world graphs and on synthetic benchmark graphs with up to 68…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
